Step-by-step tutorials for you to get started with web scrapingDownload Octoparse
How can I extract data with a list of URLs?Thursday, August 16, 2018
1. Understand Octoparse List of URLs loop mode
When your desired data spans through multiple pages sharing the same page structure, you can input the URLs of these pages into Octoparse to set up a loop. Octoparse will load the URL one by one to scrape the data from each page.
2. Maximum amount of URLs allowed to input
We suggest you add no more than 20,000 URLs for one task. Depending on the length of the URLs, this number would be slightly different.
You will receive an error indicating as below when you've exceeded the limit.
3. Start a new task with a list of URLs
- Enter your list of URLs
When more than one line of URL is added to the Extraction URL box, Octoparse would enter the List of URLs loop mode by default and create a Loop Item automatically.
- Set Wait before execution
To prevent the URLs from incompletely loading, we can set a wait time before the action is executed (2 seconds will work usually).
Advanced Options > Wait before execution
4. Edit the list of URLs you enter
After you entered the list of URL, you are still able to modify them.
Advanced Options > List of URLs
- Most popular tutorials
- Scrape product image from Amazon
- Scrape post from LinkedIn
- Scrape reviews from Amazon
- Task / Workflow Debugging
- Scrape Image URLs from a Website