You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!
Bloomberg, one of the biggest global financial websites, delivers business and market news, data, analysis, and video to the world, featuring stories from Businessweek and Bloomberg News. From the website, we can grab news on markets, technology, politics, as well as wealth. In this case, we will scrape news about Covid on Bloomberg and scrape data such as the image URL, news title, author, and summary of the news with Octoparse.
The case URL is provided below:
The main steps are shown in the menu on the right. [Download task file here]
1. Go to Web Page - to open the target website
To start web scraping, we need to first enter the website URL.
Enter the Bloomberg search URL into the search box at the center of the home screen, and click Start to create a new task with Advanced Mode.
Note: If you encounter a robot verification, please complete the verification in browse mode and remember to turn it off for further operation.
2. Auto-detect the web page - to create a workflow
On this page, the auto-detect function could help us get data easily.
Click the Auto-detect web page on the Tips and wait for the detection to complete
Check the data fields on the Data Preview and delete unwanted fields or rename them if needed
Click Create workflow to generate a workflow
The workflow would be created as below:
3. Modify the XPath of Loop Item - to locate the news item accurately
Click Loop Item 1 to open its settings
Input the Matching XPath for each news section, which would be
//div[contains(@class,'storyItem')]
Click Apply to save the settings
4. Run the task - to get the final data
Click the Save button first to save all the settings you have made
Click Run to run your task either locally or cloudly
Here we select Run on your device to run the task on your local device and wait for completion
Here is the sample output from the local run.
TIP: Local runs are great for quick runs and small amounts of data. If you are dealing with more complicated tasks or a mass of data, Run in the Cloud is recommended for higher speed. You are very welcome to try the premium feature by signing up for the 14-day free trial here. Tasks could be scheduled hourly, daily, or weekly and data delivered regularly.