undefined

Web Scraping Tutorials: Scraping Source Code from Web Pages

Thursday, March 9, 2017 8:50 PM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.

 

In this tutorial, we will show you how to use Octoparse to extract page-level data, including webpage URL, page title, meta description, meta keywords, and HTML source code.

 

How to add the data?

1. Click on the "Extract Data" action

2. Go to the "Data Preview" 

3. Click on 2.pngto add data field(s)

 

data-extraction

 

 

4. Hover on "Page-level data" to select the information that you want

The selected page-level data will be added as a field automatically to this "Extract Data" action.

 

data-extraction

 

5. Rename the data field as needed by double-clicking on the field name

 

Meaning of the fields

  • Page title: scrape the content of the title tag in the HTML

    data-extraction

    It is a short description of a webpage and appears at the top of a browser window.

  • Meta description: scrape the content of the meta description tag

    data-extraction

    The tag contains a summary of the page content. 

  • Meta keyword: scrape the content of the meta keyword tag

    data-extraction

    Scraping the page title, meta description, and meta keywords are useful when users need to improve their SEO.

  • HTML source code: the complete HTML code of the web page

 

If you need any help with task configuration or data collection, submit a ticket to our support team! We'll get back to you soon.

 

Happy Data Hunting!

Author: The Octoparse Team

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today. 

 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline