undefined
Blog > Knowledge > Post

Customize News Aggregator with Web Scraping | 2020 Guide

Tuesday, August 11, 2020

News and information is overwhelming on the Internet. Just think of how many news feeds are updated in merely one second. What’s more, all those news are scattered across different websites and platforms. Owing to the time limitation, searching and visiting all those news that you’re interested in could be an unrealistic task.

 

So, what are the solutions for people to gather all the news together without repetitive and tedious browsing drudgery:

| Using a News Aggregator Application.  (Learn more)

| Customizing your News Aggregator with a web scraping tool (like Octoparse).

 

If you want to simply browse the information, then using a News Aggregator Application is the easiest and most convenient way. However, if you want to achieve the business value of news accessible on the Internet, then a customized News Aggregator would be the best choice.

 

This article will dive deeply into News Aggregation, introducing its business value and how to build your own News Aggregator with Octoparse.

Part 1:What is News Aggregation?

Part 2:How does web scraping contribute to News Aggregation?

Part 3:How to create a web scraper to aggregate Financial news?

 

 

Part 1:What is News Aggregation?

News Aggregation is a process that helps people to access the assembled news from a variety of sources in one place. Generally speaking, people may be more familiar with some other terms, like news aggregator, news reader, feed readers, RSS reader and so on. Anyway, they all work under the same principle, scraping/extracting/gathering the news and storing/placing them in a handy location, either on your own computer or in the cloud.

Further, we can easily extend News Aggregation to all kinds of Content Aggregation. With a set of content aggregators, we could access our needed information and data anytime we want.

Here are 3 examples listed in the below table:

Type of Aggregation

Purpose

User scenario

Blog Aggregation

Collect the blog information, like the title, author bio, brief introduction of the blog, URL, etc.

Provided that you need to prepare the latest blogs to your audiences who subscribe to your RSS, then a blog aggregator could help you gather the information effectively.

Social Media info Aggregation

Collect the data you want from ALL social media platforms.

For digital marketers, it's important to know the audiences’ attitudes and this info can shed light on marketing strategy and product improvement.

Ecommerce-info Aggregation

Collect the product information across various platforms, like Amazon and Best Buy.

If you’re running an online business, Ecommerce-info Aggregation could help you with price monitoring, competitor monitoring, etc.

Part 2:How does web scraping contribute to News Aggregation?

Web scraping is a technique for website data extraction. We can either create a web scraper with tools (like Octoparse), or build it from scratch by computer languages such as Python, R, and JavaScript. That said, web scraping is the core of the News Aggregation.

Ø Collect news information effectively

Ø Export the scraped data to Excel or via API directly

Ø Update to the latest news at a certain frequency

 

Part 3:How to create a web scraper to aggregate Financial news?

With Octoparse, everyone can create a web scraper to scrape the news sites easily without coding. As long as you finish reading the short guide below, you can do it too!

I’d love to take Yahoo sport as an example to show you how to create a sports news aggregator.

 

Yahoo sports

 

Prerequisites:

l Download Octoparse on your computer.

l Go through Octoparse Scraping 101 to get familiar with how it works.

 

Let’s get started!

1) Start a task

Open Octoparse on your computer. Enter the URL to the box and click “Start”.

 

Entering a website and click "Start"

 

As you click “start”, the built-in browser will pop up in a second. Just wait a moment for the web to load. In the meantime, you can find the Tips Panel below in the corner.

 

Start auto-detection

 

Click the “Auto-detect web page data” option and Octoparse will help you auto-detect the data available on the present page.

 

Auto-detection loading

 

2) Go with auto-detection 

After finishing the auto-detection process, Octoparse will tell you what data it has detected(selected in red). If that's what you need, simply click “Create workflow” on the Tips Panel.

If that’s not what you need, you can choose “switch auto-detect results” to scrape other sets of information.

 

Create workflow or switch results

 

3) Run the task

Now, you can see the workflow has been created automatically with only a few clicks. You can check the settings and do some minor revisions(if necessary) on the workflow bar according to your needs.

However, in most cases, you can simple click “Run the task” to get the data directly.

 

Click run to run the task

 

4) Options of running

There are three options in Octoparse to run the task.

Because of the nature of news, most likely you would love to gather the updated news at intervals of a certain span of time. When you run the task, you could choose “Schedule task” to set the starting time and the frequency of updating the data as you need.

 

Run task options Schedule settings

 

Through the above steps, you just build your own sports news aggregator in Octoparse!

If you have any problem with creating a news aggregator, please feel free to contact us at support@octoparse.com.

 

Nowadays, the capacity to seize the value of data is more and more important for career development. Building your own web scraper, you can get customized information as you need. Furthermore, news aggregation with Octoparse gives you a head start as it always keeps abreast of the latest news. 

Try Octoparse for FREE to start your News aggregation project!

 

 

Author: Erika

Edited by Cici

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download