We have some important new updates for the auto-detect webpage data feature to improve the recognition success rate and accuracy of webpage elements across nearly 200 popular domains:
Enhanced the accuracy and completeness of identifying valid elements.
Added the ability to recognize content that is required for scrolling within a designated area of a web page.
Improved recognition success rate for scenarios involving pagination buttons, infinite scrolling, etc.
Implemented multilingual naming for certain commonly used fields.
What is Auto-detect?
The auto-detect feature is one of the newest innovations of Octoparse version 8. With the feature, users can easily start their work by simply clicking a single button. The feature has been successfully proven to handle webpages of different designs with listings, tables, infinite scrolls, load more buttons, etc. Now it's time to introduce this most useful and powerful feature to our valuable users.
How to use it?
1. Start a task with your target website URL
To start a task, enter the URL into the search box and click Start. In this tutorial, we will use this website as an: https://www.ebay.com/b/Laptops-Netbooks/175672/bn_1648276
2. Start the auto-detection
Once the website is fully loaded in the built-in browser, click on Auto-detect web page data from the tips panel to start the auto-detection.
3. Modify the settings
Remove unwanted data
Click on the trash icon in the Data preview to remove any unwanted data fields.
Confirm settings on the Tips"
There will be options like "extract list", "paginate", and "page scroll" listed on the "Tips" panel.
Extract the data in the list - This option is selected by default to help scrape the list of data on the page.
Paginate to scrape more pages - It locates a "Next page" button to help to get data from multiple pages.
Add a page scroll - This option is to scroll down the page after loading.
You can check/modify/unselect the settings.
a) Check the settings
Click Check under Paginate to scrape more pages and you will see the button for pagination being highlighted.
b) Modify the settings
Click the "Edit" button under the option to modify the settings.
c) Uncheck the settings
Once you don't need the option, just uncheck the box in front of it
Click Create workflow
After confirming the options, you can choose Create workflow to generate the actions.
Rename the fields
You can double-click on the field header to rename it.
4. More scraping actions
The auto-detection can help to configure the basic workflow with pagination and extract data. If you'd like to click on each link to get more information or click on the "Load More" button, you can select the options on the Tips panel to configure the actions easily.
Next page button - In case Octoparse does not recognize a pagination button, you can click on this option and select the button.
Load more button- If there is a load more button existing on the webpage, then you can choose this option, select the load more button on the page to let the scraper automatically click the button to load more data for scraping.
Infinite scroll - To set up scroll ways and repeat times.
Select subpage URL- If you want to click on the links detected and extract more information from the detail pages, choose this option and select a link you want to click on.
5. Add missing data manually
Sometimes there will be some data fields missed by the auto-detecter. You will need to add the data fields manually. Just select the information on the web page and choose Text.
6. Save settings and start extraction
Click the Save button first to save all the settings you have made, then click Run to run your task either locally or cloudly.