Incremental Extraction -- Get Updated Data with ClicksMonday, September 12, 2016 9:43 PM
New Feature: Incremental Extraction (Available for Cloud Extraction)
After you’ve set up an extraction rule for a website, you may need the updated data again from that website besides the data from previous extraction. This brand new feature Incremental Extraction allows you to extract the updated data without having to configure another rule to do this.
Using the Incremental Extraction technique to extract updated data
. You don’t need to configure another rule to extract the new data.
. Updated data is identified by new URLs that are generated by new pages.
How does the program identify the updated data
. Our program only identifies the updated data by the new URLs generated by the new web pages.
. If the URL has been crawled, the program will skip that URL when running Incremental Extraction.
. You can choose a certain parameter to get the updated data when the URLs share the same format. Here is an example. When you choose parameter ?Page= , our program will identify the new URLs by the numbers.
. You don’t need to use Incremental Extraction when the updated data is on the same pages as you crawled. i.e. The updated data is not generated by new pages.
. Incremental Extraction is available only when there is a “Extract text” action.
. Run you task at least once before using Incremental Extraction.
. Incremental Extraction is only available for Cloud Extraction.
Note: In case you may want to record the extraction time, here’s the demo.
If you have any missing feature, please drop us a message(Or just say hi).
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today!