undefined

How to Scrape an HTML Table into Excel in Octoparse

Tuesday, July 5, 2016 3:11 AM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.

 

Table data is common among websites related to finance, sports, etc. This tutorial will guide you on how to scrape table data.

If you have learned how to grab a list of data, then table data is more or less similar (Extract a list). You can take each row of the table as an element of list data. Then, each table cell is equal to a sub-element in the element.

How to collect the table data with Octoparse? Go ahead with this tutorial!

 

Case URL: https://money.cnn.com/data/hotstocks/index.html

scrape-html-2

 

1. Use the Auto-detect function to set up the workflow

2. Set up workflow manually

 

1. Use the Auto-detect function to set up the workflow

Octoparse supports auto-detecting the table and capturing all the columns. With this feature, you just need to

  • Enter the web page URL and select "Auto-detect the web page data"
  • Check if all table cells have been captured and click "Create workflow"

 

Tip!

Click Lesson 1: Extract data with the brand-new Auto-detect algorithm for details about auto-detect. 

 

2. Set up workflow manually

What if the auto-detect fails or it doesn't collect the complete table data? In this case, you need to set up the task manually. Here are the steps:

  • Select the first cell in the first row of the table, and then clickmceclip0.png("Expand the selection area" button) until it selects the whole first row

(You can click "Turn OFF Auto-detect" or "Cancel Auto-detect" to stop auto-detect if it starts automatically)

 scrape-html-1

the Tips panel will say "One or more sub-elements are found". "Sub-elements" are the specific data fields that Octoparse detects on each row of data. This is to ask if you want to locate these sub-elements.

scrape-html-3

  • Choose "Select all sub-elements" from the Tips panel. 

All the sub-elements in the first row are selected, and then Octoparse finds other similar elements highlighted in red.

scrape-html-3

  • Choose "Select all" from the Tips panel.

All the sub-elements in the table are selected and highlighted in green. 

scrape-html-5

 

 

  • Choose "Extract data" from the Tips panel. Now, Octoparse will extract all the data fields from the table. 

scrape-html-6

 

  • Edit data fields if needed (optional)

You now have all the data fields set up for the task. You can refine the data fields in the "Data Preview" section.

  • Double-click the field name to rename the data fields
  • Click  mceclip1.png  on the field for more actions: delete, copy, clean data, etc.

scrape-html-7

 

 

See the Scraping table data video tutorial here: 

 

 

If you have any trouble extracting table data, you're welcome to submit a ticket to our Support team.

 

Happy Data Hunting!

Author: The Octoparse Team

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today. 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline