SEO Data Extraction Tips: 3 Actionable SEO Hacks through Content Scraping

5 min read

When it comes to SEO, everyone is dedicated to getting ahead of their competitors, yet the fact is that there are always four-runners ranking better under a list of keywords. 

How do you improve your SEO performance? Here are 3 web scraping hacks that can help you optimize your SEO.

Optimize Your Page with Web Scraping

  1. XML Sitemaps Optimization
  2. Web Page Optimization
  3. Blog Content Curation

Sitemap optimization

What is an XML sitemap and why should we optimize it?

XML sitemaps is a file that helps Google spider to crawl and index the important URLs of a website. Thus, an excellent XML sitemaps should be “up to date, error free, and include indexable”.

Optimizing it is to help Google spider to know the website better, which would lead to a better ranking. It works significantly when you are running a medium-sized website. For example, if you’re running an eCommerce website on shopify.com, or working on your own blog on worldpress.com, it will help you rank better.

How to optimize your XML sitemaps?

If you have used/heard a program like Screaming Frog, then you already know web scraping to some degree. The working mechanism of the programs is to scrape metadata, such as the title, the meta description, keywords etc. from all the web pages that are under a domain.

To optimize your XML sitemaps, it’s recommended to use the XML Sitemaps Generator of Screaming Frog. It’s a pre-built crawler that is functioned to scrape the whole HTML of the website and generate a perfect Excel file for people to optimize.

Also, you could try using a FREE web scraper to create an XML sitemap yourself.

Web page optimization

Web page optimization is to help Google read and index the content of a website in an easier and faster way, or to cater to visitors’ preferences. Thus, it’s better if the HTML of a website conforms with Google’s ranking algorithms.

Apart from the content, the most important factor in the HTML could be H1 tag. Google Spider takes it as the core of the page.

H1 tag

According to Neilpatel, “80% of the first-page search results in Google use an h1.”(https://neilpatel.com/blog/h1-tag/)

Though the head tags are important for the ranking, we still need to pay great attention to the Meta tags which are the most straightforward conversion factors.

Thus, the handiest way to make a website ranking better is to optimize the tags on a regular basis. A small-but-mighty action that everyone should take.

In Sept 2009, Google announced that Google’s ranking algorithms didn’t include both the meta description and keywords for web search. However, we cannot deny that it has a great impact on the click-through rate. Thus, we’d better do some optimization work on both meta description and title tags.

Tips: To learn more about why meta descriptions and title tags are important, kindly refer to Meta Description and Title Tag.

How do you use web scraping to optimize your web page? 

To practice it, simply follow the steps below and you will get tag and meta description information neatly organized for later examination. 

Before getting started, download Octoparse 8.1 and install it on your computer. Since you are equipped with this web scraping tool, I’ll show you how to get the needed tags across all Octoparse blogs as an example. You can do it for any other domain.

Step 1: Open Octoparse 8.1 and enter the target URL in the box. Click the “Start” button.

Step 2: As you can see, the web page is open in Octoparse built-in Browser. On the left side, there is a workflow area where we can customize the action as needed.

Now ,we’d create a pagination to go through all the blog pages and Loop Items and visit every blog. Simply do some clicks as the following picture shows.

Step 3: Extract the needed information (Titles, Meta descriptions, title tags)

After setting the loop click and pagination, we can start extracting the needed data.

First, click on the title to extract the text and you will see a new button “Extract Data” appear in the workflow. Hover on the “Extract Data” button, double click it or click the gear icon you will enter into the data settings section. 

Click the “+” at the corner and point to “page-level data”, now you can add both meta description and meta keywords to your data list.

After adding the needed data fields, click “OK” to save.

Step 4: The final step is to scrape the data and extract them to Excel or other formats. Click “Run” at the top and you will get the data scraped within minutes.

We currently have the data in Excel and can do further analysis to optimize the web pages.

Basically, we can effectively go through all the important factors in excel.

l Batch-check whether the length of meta tags performs best in Google search results.

l Batch-Inspect the H1 tag, making sure that there is only one H1 tag for a single page and the length of the character is within an appropriate range.

Here is the standard we could refer to at School4Seo.

Apart from the above info, we can collect more information about your blogs, such as the category, share number, comment number and so on, to explore the problems of your website.

Blog Content Curation

Content curation is a way people select the most valuable pieces from web pages, and add values on top of collected information. SEO is a popular application for content curation. Curated content has become trendy on Google, helping to rank websites in a better place for search results.

How can Web Scraping help you curate the content?

A typical use case is RSS feed marketing. The advantage of RSS is pushing out the content to your users automatically, rather than forcing them to visit your website everyday. Now the question is, how do I get enough content for the RSS feed?

Image that if you’re a blogger that focuses on legal issues. Then your audiences are those who have great interest in the upcoming information about the law or some case study materials. In this case, web scraping can help you gather the information at a certain frequency for RSS purposes.

For example, with Octoparse 8.1, we’re able to gather the case information and get the information for your RSS Feed.

About Xpath

If you fail to get the data you need, you may need to amend the Xpath to precisely locate the element you want. That is because web pages are of different structure and a robot may not be applicable to all.

“XPath plays an important role when you use Octoparse to scrape data. Rewriting it can help you deal with missing pages, missing data, duplicates, etc. While XPath may look intimidating at first, it need not be. In this article, I will briefly introduce XPath and more importantly, show you how it can be used to fetch the data you need by building tasks that are accurate and precise.”

Final thoughts

Web scraping is crazily helpful if you go explore and all you need is just a handy tool like Octoparse and some basic XPath knowledge. It’s possible to help scrape almost all the information you need from every website within minutes.

The best way to obtain a new skill is to learn by practicing. Simply take some time to explore and you will find it incredibly helpful one day.


Hot posts

Explore topics

Get web automation tips right into your inbox
Subscribe to get Octoparse monthly newsletters about web scraping solutions, product updates, etc.

Get started with Octoparse today


Related Articles