You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!

Bloomberg, one of the biggest global financial websites, delivers business and market news, data, analysis, and video to the world, featuring stories from Businessweek and Bloomberg News. From the website, we can grab news on markets, technology, politics, as well as wealth. In this case, we will scrape news about Covid on Bloomberg and scrape data such as the image URL, news title, author, and summary of the news with Octoparse.

The case URL is provided below:

https://www.bloomberg.com/search?query=covid

The main steps are shown in the menu on the right. [Download task file here]

1. Go to Web Page - to open the target website

To start web scraping, we need to first enter the website URL.

Enter the Bloomberg search URL into the search box at the center of the home screen, and click Start to create a new task with Advanced Mode.

Note: If you encounter a robot verification, please complete the verification in browse mode and remember to turn it off for further operation.

2. Auto-detect the web page - to create a workflow

On this page, the auto-detect function could help us get data easily.

Click the Auto-detect web page on the Tips and wait for the detection to complete

Check the data fields on the Data Preview and delete unwanted fields or rename them if needed

Click Create workflow to generate a workflow

The workflow would be created as below:

3. Modify the XPath of Loop Item - to locate the news item accurately

Click Loop Item 1 to open its settings
Input the Matching XPath for each news section, which would be
- //div[contains(@class,'storyItem')]
Click Apply to save the settings

4. Run the task - to get the final data

Click the Save button first to save all the settings you have made
Click Run to run your task either locally or cloudly

Here we select Run on your device to run the task on your local device and wait for completion

Here is the sample output from the local run.

TIP: Local runs are great for quick runs and small amounts of data. If you are dealing with more complicated tasks or a mass of data, Run in the Cloud is recommended for higher speed. You are very welcome to try the premium feature by signing up for the 14-day free trial here. Tasks could be scheduled hourly, daily, or weekly and data delivered regularly.

Scrape phone numbers from Justdial

Scrape articles from The Washington Post

Scrape cryptocurrency prices from CoinGecko

Scrape search results from Google Scholar

Scraping news from Digital Journal.com