Blog > Data Collection > Post

How Web Crawlers Deal with List/Table Web Page

Monday, December 20, 2021

In this article, I will show you how web crawlers extract data from a list/table from web pages.


Step 1. Download and install the free edition of Octoparse. Register a new account and log in. 

Step 2. In the start screen, navigation panel is on the left-hand side of the main interface, lists all the folders. Users can quickly start a task, manage all the tasks and check tasks’ status here. In the operation panel, there are two modes (“Wizard Mode” and “Advanced Mode”) and four types of web pages for each mode (“Single Page”, “List or Table”, “List & Detail”, “List of URL”).


Now, go to “Advanced Mode” > “List or Table”> “Start”.  

Step 3. Enter a task name, and follow the prompt to click “Continue” > “Next”.   

Step 4. Enter the target URL, and follow the prompt to click “Continue” and the “Go” icon to open it in the browser.

Step 5. Click "Next" > “loop click next page” . Create a loop action to process all the web pages. The action of pagination has been added to the configuration rule.

Step 6. In this step, we click the first highlighted section.

Here, we will create a list of sections with similar layout. So click "create a list of items" and "Add current item to the list". Then the first highlighted section has been added to the list. Then click "Continue to edit the list".

Then we click the second highlighted section.

Then click "Add current item to the list" again. Now we get all the sections with similar layout.

Then click "Finish Creating List".

And Click "loop" to process the list for extracting the elements in each section.

Step 7. In this step, we will extract the music video and views of the first section. Click these two elements and extract the text.

Step 8. We can define fields in the table on the right-hand side of the interface.

Before executing the extraction rules, we drag the “Loop Item” into the “Cycle Pages” in the Workflow Designer so that we can grab all the elements of sections from multiple pages.

Then click “Next”>”Next” to proceed.

Step 9. Click “Local Extraction” > “OK” to run the task on your computer. And Octoparse will extract the data automatically. 


In the data extraction panel, we can see the target web pages and the data extracted pane on the left-hand side. You can also select the extraction options on the right-hand side of this panel to optimize the extraction process.

Hit the “Export Data” option at the bottom of the data extraction panel to choose one format to save the file on the computer.


Now it’s done!


Author: The Octoparse Team



Download Octoparse Today



For more information about Octoparse, please click here.

Sign up today.



Author's Picks


About Octoparse

Octoparse 6.0 is Now Available

What A Price Monitor Can Help you?

Examples of Businesses Who Use Data Scraping

Collect Data from Facebook

Collect Data from Craigslist

Collect Data from LinkedIn




We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline