How Web Scraping Works for Content AggregationMonday, January 25, 2021
In this article, I will show you what content curation or content aggregation is, why you should start curate content, and how web scraping can help you aggregate content step by step.
Table of Contents
What is content curation/aggregation?
Content curation often gets mixed up with content aggregation and content syndication.
- Content aggregation: is to umbrella the information under one common topic with one or more keywords. For example, a Hashtag in Twitter allows topics and information to group together so whenever you click through one hashtag, the related information will be pulled out. SERP is another great example of content aggregation. As you type in the keywords “digital marketing”, there are all kinds of content including whitehat marketing tips and marketing software will appear in the search results.
- Content curation: you select the most valuable pieces, and add values on top of collected information. SEO is a popular application of content curation. Curated content becomes trendy on Google. For example, in the article Best Data Scraping Tools for 2019 (Top 10 Reviews) I did thorough research on web scraping tools and chose the most outstanding ones along with their pros and cons. Another great example of content curation is Nextdraft found by DavePell who provides his own summarise and perspective about the ten most interesting and newsworthy stories with original links.
- Content syndication: it is the process to pass the same content from one source to the 3rd party website along with its source for the audience to reference. Law firms must stay up-to-date and get eyeballs on the newsletters as new bills are racing to get passed. An automated aggregator would help to scrape from multiple sources daily, and syndicates to the database for future reference.
Why you should curate content?
Content curation is a very popular business model. It collects in-depth resources on a niche subject, and you can make money from advertising and affiliating marketing. People love content curation because the explosive information on the internet makes them harder to pick and meet their needs. Content curation allows to
- provide guidance that people save a huge amount of time from exploring on the internet.
- Content curation selects and categorizes the most valuable information in breadth. This allows various alternatives information within scope available for the audience to pick one. The bad contents are, therefore, left off, and not get presented.
- curator like Nextdraft provides insights and commentary news. Curator, therefore, builds up his/her reputation by sharing and teaching.
How to make money from content curation
- Ads: Ads display is the best way to make passive income. You can use a third-party service to get advertisers or use plugin like Ads Pro Plugin
- Promotion: If you are familiar with Makeup Youtubers, the promotion won’t be a foreign concept. Once you build up your audience pool, they have values to businesses. Companies love to pay you to feature their product and service.
- Affiliate Program: You promote our product and service on your site. Drive as many paying users as you can, and get a commission for each sale you make. Amazon Associates is a great example of affiliate marketing as they offer sales commission based opportunity for experienced marketers to get traffic. Octoparse provides affiliate opportunity as well, check to get detail information
- Membership Subscription: Medium is a well-known blog post platform. They charge $5 per month for subscribed readers to access great articles. The idea is that you can get your readers to pay you to curate up-to-date and valuable content for them. Please make sure your curated content not only fits their needs but also add extra value.
- Email List: Now you have subscribers who register through emails. You can build an email list in which you can promote your products or interesting contents. A classic example is Moz. It looks very tempting to me.
There are many forms of content curations for you to choose from
- News sites: Buzzfeed provides both curate and create. They have so much interesting topics from “32 things people heard in sex” to “28 Tumblr Posts that will make you laugh no matter what.” It is a great place to consume in bed on a Saturday morning
- Internet Mall: ThisIsWhyImBroke is a very interesting online shopping website. They curated weird and interesting stuff. Retailers pay for this website to promote their products. The owner makes at least $20,000 each month from affiliate marketing only.
- Social Media: Pinterest is a social media site with tons of great photos and images. I used Pinterest for ideas when I furnish the bedroom.
- Youtube Video: Since roundup becomes the new black in Google, I made a roundup video to count the best web scraping tools. It went pretty successful.
- Event Sites: Company’s events, concerts, dancing party, recruit, farm sales, Marathon, Mud wrestling, etc, the Eventtribe is the place I would like to go search during the weekend.
How to create a great content aggregator
- Find your niche: Don’t confuse your readers. If you want to help people to find the best deals online, they will get puzzled when you put on investment news. To maximize the results, you need to start small in which you specialize.
- Start collecting content:
Normally people will use RSS to collect the content. RSS stands for Really Simple Syndication. It’s designed to collect all our favorite websites all in one place. The problem’s not user-friendly as it appears as programming codes. So we have to use a Reader to present the codes. The idea here is that we use RSS to connect and syndicate with target websites. Then use Google Reader to read and pick the best content. And then compile and present them in a way that is meaningful to your audience.
Step 1: Get Google Reader extension
Step 2: Add RSS extension on Chrome (of any web browser)
Step 3: Find a site that has an RSS feed and then subscribes the RSS with RSS extension we just added
Step 4: Pick several contents, and compile them together.
Step 5: Reorganize the content, and add your thoughts and insights along with the original links.
How Web Scraping can help you curate the content:
The problem comes when the websites don’t have RSS feed. Many people said RSS is outdated and neglect using it. Web scraping comes into play as a more convenient and automated solution. You can extract anything with web scraping tool without worrying about coding. For example, with Octoparse, you can extract news articles with its data, author, content, and URLs in a structured format directly into your database. Let’s take my favorite newsmedia Reuter News as an example.
The goal here is to scrape all newly released articles regularly. I use Octoparse.
Step 1. Follow this video and create a new extraction task Video Tutorial Click
Step2. Click “Enable Incremental Extraction” from the Task Setting to extract only new released articles.
In this way, you finish creating a news aggregator from one news media. Extracted data will be saved in the cloud or deliver to your database through API. In the same manner, you can create 10 or even 100 crawlers to aggregate news articles from these places regularly. As a result, Octoparse will directly add to your database whenever there is a news article gets released.
Ashley is a data enthusiast and passionate blogger with hands-on experience in web scraping. She focuses on capturing web data and analyzing in a way that empowers companies and businesses with actionable insights. Read her blog here to discover practical tips and applications on web data extraction
Si desea ver el contenido en español, por favor haga clic en: ¿Cómo funciona el web scraping para la curación de contenido? También puede leer artículos de web scraping en el Website Oficial