How Web Scraping Works for Content Aggregation

5 min read

In this article, I will show you what content curation or content aggregation is, why you should start to curate content, and how web scraping can help you aggregate content step by step.

What is content curation/aggregation?

Content curation often gets mixed up with content aggregation and content syndication. 

  1. Content Syndication: The process is to pass the same content from one source to a 3rd party website along with its source for the audience to reference. Law firms must stay up-to-date and get eyeballs on newsletters as new bills race to get passed. An automated aggregator will help scrape from multiple sources daily and syndicate to the database for future reference. 
  2. Content aggregation: is to umbrella the information under one common topic with one or more keywords. For example, a Hashtag on Twitter allows topics and information to be grouped together so whenever you click through one hashtag, the related information will be pulled out. SERP is another great example of content aggregation. As you type in the keywords “digital marketing”, there are all kinds of content including whitehat marketing tips and marketing software that will appear in the search results. 
  3. Content curation: you select the most valuable pieces and add values on top of collected information. SEO is a popular application for content curation. Curated content has become trendy on Google. For example, in the article Best Data Scraping Tools for 2019 (Top 10 Reviews) I did thorough research on web scraping tools and chose the most outstanding ones along with their pros and cons. Another great example of content curation is Nextdraft, found by Dave Pell, who provides his own summary and perspective about the ten most interesting and newsworthy stories with original links.  

Why should you curate content?

Content curation is a very popular business model. It collects in-depth resources on a niche subject, and you can make money from advertising and affiliating marketing. People love content curation because the explosive information on the internet makes it harder for them to pick and meet their needs. Content curation allows to

  • provide guidance that people save a huge amount of time from exploring on the internet. 
  • Content curation selects and categorizes the most valuable information in breadth. This allows for various alternatives to information within the scope available for the audience to pick one. The bad content is therefore left off and does not get presented.
  • A curator like Nextdraft provides insights and commentary on the news. Curator, therefore, builds up his/her reputation by sharing and teaching.

How to make money from content curation

  • Ads: Ads display is the best way to make passive income. You can use a third-party service to get advertisers or use plugins like Ads Pro Plugin
  • Promotion: If you are familiar with Makeup Youtubers, the promotion won’t be a foreign concept. Once you build your audience pool, they have value to businesses. Companies love to pay you to feature their products and services.
  • Affiliate Program: You promote our products and services on your site. Drive as many paying users as you can and get a commission for each sale you make. Amazon Associates is a great example of affiliate marketing as they offer sales commission based opportunities for experienced marketers to get traffic.
  • Membership Subscription: Medium is a well-known blog post platform. They charge $5 per month for subscribed readers to access great articles. The idea is to get your readers to pay you to curate up-to-date and valuable content for them. Please make sure your curated content not only fits their needs but also adds extra value.
  • Email List: You now have subscribers who register through email. You can build an email list in which you can promote your products or interesting content. A classic example is Moz. It looks very tempting to me.

There are many forms of content curations for you to choose from

  • News site: Buzzfeed provides both curate and create. They have so much interesting topics from “32 things people heard in sex” to “28 Tumblr Posts that will make you laugh no matter what.” A great place to consume in bed on a Saturday morning
  • Internet Mall: ThisIsWhyImBroke is a very interesting online shopping website. They curated weird and interesting stuff. Retailers pay for this website to promote their products. The owner makes at least $20,000 each month from affiliate marketing only. 
  • Social Media: Pinterest is a social media site with tons of great photos and images. I use Pinterest for ideas when I furnish the bedroom.
  • YouTube Video: Since roundup became the new black in Google, I made a roundup video to count the best web scraping tools. It went pretty successfully.
  • Event Sites: Company events, concerts, dance parties, recruitment, farm sales, Marathon, Mud wrestling, etc., Eventtribe is the place I would like to go search during the weekend. 

How to create a great content aggregator

 How do you create an aggregator website that makes money?

  • Find your niche: Don’t confuse your readers. If you want to help people find the best deals online, they will get puzzled when you put on investment news. To maximize the results, you need to start small in which you specialize. 
  • Start collecting content:

Normally people will use RSS to collect the content. RSS stands for Really Simple Syndication. It’s designed to collect all our favorite websites in one place. The problem’s not user-friendly as it appears as programming code. You have to use a Reader to present the code. The idea here is to use RSS to connect and syndicate with targeted websites. Then use Google Reader to read and pick the best content. Then compile and present them in a way that is meaningful to your audience. 

Step 1: Get the Google Reader extension

Step 2: Add RSS extension on Chrome (of any web browser)

Step 3: Find a site that has an RSS feed and then subscribe to the RSS with RSS extension we just added

Step 4: Pick several contents and compile them together.

Step 5: Reorganize the content and add your thoughts and insights along with the original links.

How Web Scraping can help you curate content

The problem comes when the websites don’t have RSS feed. Many people say RSS is outdated and neglect using it. Web scraping comes into play as a more convenient and automated solution. You can extract anything with a web scraping tool without worrying about coding. For example, with Octoparse, you can extract news articles with its data, author, content, and URLs in a structured format directly into your database. Let’s take my favorite news media, Reuters News, as an example.  

The goal here is to scrape all newly released articles regularly. I use Octoparse.

Step 1. Follow this video and create a new extraction task Video Tutorial Click

Step2. Click “Enable Incremental Extraction” from the Task Settings to extract only newly released articles.

Step 3. Set a scheduled cloud extraction time of 30 mins interval from the Dashboard Panel.

In this way, you finish creating a news aggregator from one news media. Extracted data will be saved in the cloud or delivered to your database through API. In the same manner, you can create 10 or even 100 crawlers to aggregate news articles from these places regularly. Octoparse will be added directly to your database whenever a news article gets released.

Hot posts

Explore topics

Get web automation tips right into your inbox
Subscribe to get Octoparse monthly newsletters about web scraping solutions, product updates, etc.

Get started with Octoparse today


Related Articles