Scraping Articles from Reuters.com

Thursday, January 5, 2017 9:22 PM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.


In this tutorial, I will show you how to quickly scrape a bunch of news articles from Reuters.com.

Our data fields include the article title, body text, published date/time, and author name.

Use the sample URL below to follow through:



Step 1. Create a Go to Web Page - to go to the target webpage

  • Enter the sample URL in the search bar on the home screen and click Start


Step 2. Auto-detect the webpage - to create the workflow

  • Click auto-detect webpage from the Tips panel and wait for it to complete
  • Choose the desirable auto-detect results (1/3)
  • Check the Paginate to scrape more pages option to see if it works for our webpage
  • Uncheck Add a page scroll
  • Click Create Workflow


auto-detect panel


  • Click Click on links to scrape the linked page(s) 


scrape linked pages

  • Select the right data field for the linked page URL from the dropdown menu and Click check to see if it works
  • Click Confirm to save the settings


select right


  • Select the first paragraph of the article and choose Select All from the Tips panel
  • Click Extract text of the selected element


Step 3. Adjust workflow settings

    • Rename the data fields for the first Extract Data action
    • Click the three dots for more settings on the paragraph data field
    • Click Customize XPath and Change XPath for the paragraph data field to //p[contains(@data-testid,"paragraph")]
    • Click Merge multiple rows of data into one


Step 4. Save the task and run it to get data

  • Click Save on the upper right to save your task
  • Click Run next to it and wait for a Run Task window to pop up
  • Select Run on your device to run the task on your local device


Happy Data Hunting!

Author: The Octoparse Team

Download Octoparse Today


For more information about Octoparse, please click here.

Sign up today. 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline