Step-by-step tutorials for you to get started with web scrapingDownload Octoparse
The latest version for this tutorial is available here. Go to have a check now!
There are 3 ways for us to scrape image URLs from a website by using Octoparse. We can choose one of them according to our requirements for data format.
Format 1: All extracted image URLs of a webpage are laid out in the same row but different columns.
If we want to get the data extracted into each different columns, just repeat the "click" & "extract" steps as following.
Format 2: All the Image URLs on the same webpage are exported in one column but different rows.
If we build a loop item to scrape all the image URLs on one page, we could have each image URL extracted into one column but different rows.
Octoparse will automatically detect all other similar images on the current page. The one you already selected will be highlighted in green while others will be highlighted in red.
Format 3: All the image URLs are exported in one cell by using RegExp Tool.
Provided we need all the image URLs of a product/webpage extracted in a cell, we can use Octoparse RegExp Tool to pick up all the image URLs from the source code of the webpage.
The selected area would be highlighted in green.
According to the source code, all the image URLs start with "https://" and end with "jpg". Thus, we are able to pick up all of them with RegExp Tool.
To further study the functions of Octoparse RegExp Tool, please refer to the following tutorials:
To download the image from the URLs we already scraped, please refer to the article - How to download images from a list of URLs?
Author: Erika F