Get page-level data (metadata, URL, title & HTML)

Octoparse not only captures information from the web page body, but also gets page-level data including webpage URL, page title, meta description, meta keywords, and HTML source code.

You can easily follow the steps below to add them:

STEP 1. Select an Extract Data from the workflow

STEP 2. Go to the Data Preview section then click on Add Custom Fields button

STEP 3. Select your target data field from Page-level data

STEP 4 (optional). Rename the data field by double-clicking on the field name

There are 5 types of data can be added in this way:

Page URL: URL of the current page
Page title: title of the current page, which is a short description of a webpage and appears at the top of a browser window.

Meta description: meta description tag of the current page, which contains a summary of the page.

Meta keyword: meta keyword tag of the current page

HTML source code: the complete HTML code of the web page

What is Custom Task?

Lesson 3: Refine your data

Scrape data from both listing and detail pages

Add custom data field

What types of websites/data can Octoparse scrape?