Step-by-step tutorials for you to get started with web scraping

Download Octoparse

Scrape Image URLs from a Website

Monday, January 14, 2019

There are 3 ways for us to scrape image URLs from a website by using Octoparse. We can choose one of them according to our requirements for data format.

 

 

Format 1: All extracted image URLs of a webpage are laid out in the same row but different columns.

 

If we want to get the data extracted into each different columns, just repeat the "click" & "extract" steps as following.

  • Click the wanted image on the web page.
  • Select "Extract URL of the selected image" on the "Action Tips".

 

 

 

Tips!

  • When you select an IMG element, the selected tag should be "IMG". Normally there’s no need to adjust the tag, as Octoparse automatically identifies tags of selected items. But in some cases, we need to revise the tag on the bottom of "Action Tips".

     

 

Format 2: All the Image URLs on the same webpage are exported in one column but different rows.

 

 

If we build a loop item to scrape all the image URLs on one page, we could have each image URL extracted into one column but different rows.

  • Click an image in the built-in browser

Octoparse will automatically detect all other similar images on the current page. The one you already selected will be highlighted in green while others will be highlighted in red.

  • Click "Select all" on the "Action Tips"
  • Click "Extract image URLs in a loop"

 

 

 

Format 3: All the image URLs are exported in one cell by using RegExp Tool.

 

Provided we need all the image URLs of a product/webpage extracted in a cell, we can use Octoparse RegExp Tool to pick up all the image URLs from the source code of the webpage.

  • Click any spot on the targeted web page.

The selected area would be highlighted in green.

  • Enlarge the selected area by clicking the button on the bottom of the "Action Tips".
  • Click "Extract inner HTML of the selected element".
  • Re-format the extracted data with Octoparse RegExp Tool.

According to the source code, all the image URLs start with "https://" and end with "jpg". Thus, we are able to pick up all of them with RegExp Tool.

  • Click "Customize data field"
  • Select "Refine extracted data"
  • Select "Match with regular expression”
  • Click "Try RegExp Tool"
  • Check "Start with" and "Include start" and enter "https://" in the following box
  • Check "End withand "Include end" and enter "jpg" in the following box
  • Click "Generate"
  • Check "Match all" and click "Match"
  • Click "Apply"

 

 

Tips!

To further study the functions of Octoparse RegExp Tool, please refer to the following tutorials:

 

NEXT

To download the image from the URLs we already scraped, please refer to the article - How to download images from a list of URLs?

 

Author: Erika F

Editor:Suire M

 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_form.png