All Collections
Glossary
What is Auto-detect and how to use it?
What is Auto-detect and how to use it?
Updated over a week ago

We have some important new updates for the auto-detect webpage data feature to improve the recognition success rate and accuracy of webpage elements across nearly 200 popular domains:

  • Enhanced the accuracy and completeness of identifying valid elements.

  • Added the ability to recognize content that is required for scrolling within a designated area of a web page.

  • Improved recognition success rate for scenarios involving pagination buttons, infinite scrolling, etc.

  • Implemented multilingual naming for certain commonly used fields.

What is Auto-detect?

The auto-detect feature is one of the newest innovations of Octoparse version 8. With the feature, users can easily start their work by simply clicking a single button. The feature has been successfully proven to handle webpages of different designs with listings, tables, infinite scrolls, load more buttons, etc. Now it's time to introduce this most useful and powerful feature to our valuable users.


How to use it?

1. Start a task with your target website URL

To start a task, enter the URL into the search box and click Start. In this tutorial, we will use this website as an: https://www.ebay.com/b/Laptops-Netbooks/175672/bn_1648276

2. Start the auto-detection

Once the website is fully loaded in the built-in browser, click on Auto-detect web page data from the tips panel to start the auto-detection.

3. Modify the settings

  • Remove unwanted data

Click on the trash icon in the Data preview to remove any unwanted data fields.

  • Confirm settings on the Tips"

There will be options like "extract list", "paginate", and "page scroll" listed on the "Tips" panel.

  1. Extract the data in the list - This option is selected by default to help scrape the list of data on the page.

  2. Paginate to scrape more pages - It locates a "Next page" button to help to get data from multiple pages.

  3. Add a page scroll - This option is to scroll down the page after loading.

1.png

You can check/modify/unselect the settings.

a) Check the settings

Click Check under Paginate to scrape more pages and you will see the button for pagination being highlighted.

b) Modify the settings

Click the "Edit" button under the option to modify the settings.

3.png

c) Uncheck the settings

Once you don't need the option, just uncheck the box in front of it

4.png
  • Click Create workflow

After confirming the options, you can choose Create workflow to generate the actions.

5.png
  • Rename the fields

You can double-click on the field header to rename it.

4. More scraping actions

The auto-detection can help to configure the basic workflow with pagination and extract data. If you'd like to click on each link to get more information or click on the "Load More" button, you can select the options on the Tips panel to configure the actions easily.

  • Next page button - In case Octoparse does not recognize a pagination button, you can click on this option and select the button.

  • Load more button- If there is a load more button existing on the webpage, then you can choose this option, select the load more button on the page to let the scraper automatically click the button to load more data for scraping.

  • Infinite scroll - To set up scroll ways and repeat times.

  • Select subpage URL- If you want to click on the links detected and extract more information from the detail pages, choose this option and select a link you want to click on.

5. Add missing data manually

Sometimes there will be some data fields missed by the auto-detecter. You will need to add the data fields manually. Just select the information on the web page and choose Text.

6. Save settings and start extraction

Click the Save button first to save all the settings you have made, then click Run to run your task either locally or cloudly.

Did this answer your question?