Scrape Data from Multiple URLs Using OctoparseThursday, April 22, 2021
If you are working on a big project that requires lots of data, working knowledge of web scraping tools is definitely an asset. Today we are going to see scenarios where you need to scrape data from multiple URLs and how you can do it in an easy way.
The Need for Multi-URL Scraping
Multi-URL scraping is required mostly in three scenarios:
- When you need to collect data that extends across multiple pages
- When you have an existing list of URLs you want to crawl data from
- In some cases, people would first extract all the URLs of the web pages they want data from, and start crawling data from the list in the next step.
For example, when you scrape product listings information from e-commerce like Amazon, you may need to loop over multiple pages under one category or query. And very likely these web pages share the same page structure.
Another example is when you need to aggregate data from multiple websites like news or financial publications. You may gather all the URLs of these news and articles for the scraping task later.
Ways to Scrape Data from Multiple URLs
- Computer Language (Coding)
If you’re from a technical background and have good programming knowledge, you can make use of BeautifulSoup, Scrapy, Selenium like packages available in Python to build your own multi-url Scraper. But scripting can be intimidating for non-coders and also complexity increases even for developers with different web pages.
- Web Scraping tool (Without Coding)
If you are not proficient with coding, web scraping tools will be more suitable and make scraping easy for you. First, you will need to figure out the right web scraper tool. There are many tools in the market like Mozenda, Outwit Hub, Scrapinghub, etc. But they don't provide all the necessary features like pre-built templates, free unlimited crawls, API integration, cloud-based extraction, large-scale scraping, and should not be expensive. Therefore, we recommend Octoparse, a free and powerful web scraper that can extract data from any website.
Scrape Data from Multiple URLs using Octoparse Template Mode
Template Mode Scraping is valuable for those who prefer to skip the learning and need to extract data fast from some of the most popular websites like Amazon, Instagram, Twitter, YouTube, Booking, TripAdvisor, Yellowpage, Walmart, and many more out there.
We will walk through the steps necessary to set up a web scraper to scrape data from multiple URLs using the Octoparse template.
Step-1: Select “Task Templates” from the home screen and pick a template. Select “Try it".
Step-2: Type up to 3 keywords in the “keywords” field. Using Template mode you don't need to give URLs of 5 pages if you want to scrape multi-url instead just type 5 in the “Number of pages” field.
Step-3: It's now time to “Save and Run” the task in the cloud. Octoparse will now go and scrape the data you’ve selected. You will be notified on Dashboard when it’s done. You can download your data as a CSV, Excel, JSON, or HTML.
Sample data scraped by Octoparse Amazon scraper
Scrape Data from Multiple URLs using Octoparse Advanced Mode
The Advanced mode has more customization and flexibility comparing the other mode. Advanced mode lets you build a crawler from scratch for a more complex website and also it has an auto-detection feature that makes your job easy.
Now let's build the crawler using advanced mode with the necessary steps.
Step1. Click "+New" and select "Advanced Mode" to create a new task.
Step2. Paste the list of URLs in the textbox and click "Save URL".
Step3. After clicking "Save", the "Loop URLs" (which loops through each URL of the list) is automatically created in the workflow.
Step4. Click on the Go To Web Page. Under "Before Page render", set a “wait before action” time for 2 seconds to avoid page load interruption.
And that’s it! Now you know how to scrape data from multiple URLs using Octoparse. We really hope this article helped and don’t forget to try scraping on other sites too. If you have any trouble with anything, feel free to contact support at Octoparse help center.