How Web Crawlers Deal with List/ Table Web Page

In this article, I will show you how web crawlers extract data from a list/ table from web pages.

Step 1

Download and install the free edition of Octoparse. Register a new account and log in.

Step 2

In the start screen, the navigation panel is on the left-hand side of the main interface, lists all the folders. Users can quickly start a task, manage all the tasks and check tasks’ status here. In the operation panel, there are two modes (“Wizard Mode” and “Advanced Mode”) and four types of web pages for each mode (“Single Page”, “List or Table”, “List & Detail”, “List of URL”).

Now, go to “Advanced Mode” > “List or Table”> “Start”.

Step 3

Enter a task name, and follow the prompt to click “Continue” > “Next”.

Step 4

Enter the target URL, and follow the prompt to click “Continue” and the “Go” icon to open it in the browser.

Step 5

Click “Next” > “loop click next page”. Create a loop action to process all the web pages. The action of pagination has been added to the configuration rule.

Step 6

In this step, we click the first highlighted section.

Here, we will create a list of sections with similar layouts. So click “Create a list of items” and “Add current item to the list”. Then the first highlighted section has been added to the list. Then click “Continue to edit the list”.

Then we click the second highlighted section.

Then click “Add current item to the list” again. Now we get all the sections with similar layouts.

Then click “Finish Creating List”.

And Click “loop” to process the list for extracting the elements in each section.

Step 7

In this step, we will extract the music video and views of the first section. Click these two elements and extract the text.

Step 8

We can define fields in the table on the right-hand side of the interface.

Before executing the extraction rules, we drag the “Loop Item” into the “Cycle Pages” in the Workflow Designer so that we can grab all the elements of sections from multiple pages.

Then click “Next”> ”Next” to proceed.

Step 9

Click “Local Extraction” > “OK” to run the task on your computer. And Octoparse will extract the data automatically.

In the data extraction panel, we can see the target web pages and the data extracted pane on the left-hand side. You can also select the extraction options on the right-hand side of this panel to optimize the extraction process.

Hit the “Export Data” option at the bottom of the data extraction panel to choose one format to save the file on the computer.

Now it’s done!

Data Collection

Build a Laser-Focused Prospect List with Web Scraping in 5 Steps

Abigail Jones

This blog post discusses how businesses can use web scraping to build targeted prospecting lists that improve their lead-generation efforts. It provides an overview of lead generation and targeted lists, answers common questions, and offers a step-by-step guide to scraping leads data using Octoparse.

June 20, 2023 · 4 min read

Knowledge

3 Methods on How to Export HTML Table to Excel

Ansel Barrett

You must find the data in a table format when you're going through the web pages, especially the financial sites. In this article, we will introduce 3 easy methods to export HTML table data to Excel files.

November 8, 2022 · 4 min read

Data Collection

6 Tips to Extract Content from Web Page

Abigail Jones

Web scraping is the technique to get web content for your own use. This article introduces 6 tips to extract content from web pages without any coding using Octoparse tool.

July 19, 2022 · 3 min read

Data Collection

URL Extractor: Get URLs from Hyperlinks in A Web Page

Ansel Barrett

This is a quick guide to help you pull down a list of URLs or a list of data on a web page into excel using Octoparse. Is this the URL extractor you are looking for? Let’s see.

December 27, 2021 · 4 min read

How Web Crawlers Deal with List/ Table Web Page

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Step 8

Step 9

Hot posts

Explore topics

Get started with Octoparse today

Related Articles