undefined
Blog > Web Scraping > Post

Scrape Data from Multiple URLs Using Octoparse

Thursday, April 22, 2021

If you are working on a big project that requires lots of data, working knowledge of web scraping tools is definitely an asset. Today we are going to see scenarios where you need to scrape data from multiple URLs and how you can do it in an easy way.

 

The Need for Multi-URL Scraping

Multi-URL scraping is required mostly in three scenarios:

  1. When you need to collect data that extends across multiple pages
  2. When you have an existing list of URLs you want to crawl data from
  3. In some cases, people would first extract all the URLs of the web pages they want data from, and start crawling data from the list in the next step.

 

For example, when you scrape product listings information from e-commerce like Amazon, you may need to loop over multiple pages under one category or query. And very likely these web pages share the same page structure. 

Another example is when you need to aggregate data from multiple websites like news or financial publications. You may gather all the URLs of these news and articles for the scraping task later.

 

Ways to Scrape Data from Multiple URLs

 

  • Computer Language (Coding)

If you’re from a technical background and have good programming knowledge, you can make use of BeautifulSoup, Scrapy, Selenium like packages available in Python to build your own multi-url Scraper. But scripting can be intimidating for non-coders and also complexity increases even for developers with different web pages. 

 

  • Web Scraping tool (Without Coding)

If you are not proficient with coding, web scraping tools will be more suitable and make scraping easy for you. First, you will need to figure out the right web scraper tool. There are many tools in the market like Mozenda, Outwit Hub, Scrapinghub, etc. But they don't provide all the necessary features like pre-built templates, free unlimited crawls, API integration, cloud-based extraction, large-scale scraping, and should not be expensive. Therefore, we recommend Octoparse, a free and powerful web scraper that can extract data from any website.

 

Octoparse provides two solutions to scrape data from multiple URLs that are Template Mode and Advanced Mode. Now we will see both solutions one by one in more detail.

 

Scrape Data from Multiple URLs using Octoparse Template Mode

Template Mode Scraping is valuable for those who prefer to skip the learning and need to extract data fast from some of the most popular websites like Amazon, Instagram, Twitter, YouTube, Booking, TripAdvisor, Yellowpage, Walmart, and many more out there.

 

Octoparse web scraping templates

 

We will walk through the steps necessary to set up a web scraper to scrape data from multiple URLs using the Octoparse template.

Step-1: Select “Task Templates” from the home screen and pick a template. Select “Try it".

 

octoparse start scraping with template scrapers

 

Step-2: Type up to 3 keywords in the “keywords” field. Using Template mode you don't need to give URLs of 5 pages if you want to scrape multi-url instead just type 5 in the “Number of pages” field.

 

start scraping with octoparse template scrapers

 

Step-3: It's now time to “Save and Run” the task in the cloud. Octoparse will now go and scrape the data you’ve selected. You will be notified on Dashboard when it’s done. You can download your data as a CSV, Excel, JSON, or HTML.

 

data scraped by octoparse amazon scrapers

Sample data scraped by Octoparse Amazon scraper

 

Scrape Data from Multiple URLs using Octoparse Advanced Mode

The Advanced mode has more customization and flexibility comparing the other mode. Advanced mode lets you build a crawler from scratch for a more complex website and also it has an auto-detection feature that makes your job easy.

 

Now let's build the crawler using advanced mode with the necessary steps.

Step1. Click "+New" and select "Advanced Mode" to create a new task.

create a new scraper on octoparse

 

Step2. Paste the list of URLs in the textbox and click "Save URL".

paste a list of urls into the scraper

 

Step3. After clicking "Save", the "Loop URLs" (which loops through each URL of the list) is automatically created in the workflow.

scrape from a list of urls

 

Step4. Click on the Go To Web Page. Under "Before Page render", set a “wait before action” time for 2 seconds to avoid page load interruption.

set time for page rendering

 

Closing Thoughts

And that’s it! Now you know how to scrape data from multiple URLs using Octoparse. We really hope this article helped and don’t forget to try scraping on other sites too. If you have any trouble with anything, feel free to contact support at Octoparse help center.

 

Author: Kajal

 

 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download
btnImg
We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline