Step-by-step tutorials for you to get started with web scraping

Download Octoparse

How can I extract data with a list of URLs?

Thursday, August 16, 2018

1. Understand Octoparse List of URLs loop mode

When your desired data spans through multiple pages sharing the same page structure, you can input the URLs of these pages into Octoparse to set up a loop. Octoparse will load the URL one by one to scrape the data from each page.

 

 

2. Maximum amount of URLs allowed to input

We suggest you add no more than 20,000 URLs for one task. Depending on the length of the URLs, this number would be slightly different.

You will receive an error indicating as below when you've exceeded the limit.

 

 

3. Start a new task with a list of URLs

- Enter your list of URLs

When more than one line of URL is added to the Extraction URL box, Octoparse would enter the List of URLs loop mode by default and create a Loop Item automatically.

 

- Set Wait before execution

To prevent the URLs from incompletely loading, we can set a wait time before the action is executed (2 seconds will work usually).

Advanced Options > Wait before execution

 

4. Edit the list of URLs you enter

After you entered the list of URL, you are still able to modify them.

Advanced Options > List of URLs

 

 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_use.png
btn_sidebar_form.png