undefined

How to Scrape WordPress Posts

Sunday, January 15, 2017 9:02 PM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.

 

In this web scraping tutorial, we will teach you how to scrape the daily posts from the WordPress website. We will scrape data fields including post title, author name, post introduction and published date. The target URL for this task is https://dailypost.wordpress.com/posts/. 

 

Step 1. Create a new task with the sample URL

  • Enter the sample URL into the search bar on the home screen and click Start

 

Step 2. Create a pagination loop to click through multiple pages

  • Click the older page icon on the webpage and select Loop click single URL from the tips panel
  • Set AJAX timeout to 3s

 

Step 3. Create a loop item for the daily posts

  • Click the title of the first post and choose Select All from the Tips panel
  • Click Extract text from the selected links to extract the post title (Building up the extraction loop)
  • Click the loop item and change its XPath to //div[@class="archive-list"]/article
  • Click on the author name and click Extract text from the selected links to extract the author name (No need to choose Select All because the loop is already established)
  • Repeat the last step and extract post introduction and published date
  • Turn to the Data Preview section and rename the data fields

 

Step 4. Check the workflow and element XPaths

Now we need to check the workflow by clicking actions from the beginning of the workflow. Make sure that we can scrape the content from the pages.

If you notice any data missing from the Data Preview section, check the XPath for your data fields. Sometimes you need to write them manually. Check our new help portal for tutorials on XPath.

 

Step 5. Run the task and export the data collected

You can now run the task on your local machine or in the cloud. You can even schedule it to run on Octoparse's cloud platform.

 

Happy Data Hunting!

Author: The Octoparse Team

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today. 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline