What is Configuration Rule in Octoparse?

4/18/2016 11:47:12 PM


Genarally, web crawlers like Google will retrieve all of the webpages. It can find links and content inside (usually text) to make sure what they are and in this way it can index the search pages.

But crawlers run in Octoparse are determined by the rules configured, and the data extracted is structured. It does not understand the web content with advanced algorithms, but it grabs the exact web content to you perfectly.

Today we’ll talk about what is Octoparse RULE.


Extraction rule is one of the most important features of Octoparse. The rule configured would tell Octoparse: which website is to be open; where is the data you plan to crawl; what kind of data you want, etc.

You can configure the rule to paginate, to scrape a website behind a login, to collect data from webpages loaded with AJAX, to scrape a website with infinite scrolling. But, you have to make these happen by making a rule.

Trust me. It's very easy. If you can use a web browser, you can use Octoparse. Moreover, Octoparse has a visible workflow designer to show how the rule is created.

You do not need to write any code in Octoparse. Just tell Octoparse what you want it to do by dragging actions into the workflow designer and selecting options to optimize the process.


Let’s take an example of a simple web page extraction with pagination.



 Happy Data Hunting!









Author: The Octoparse Team




Download Octoparse Today



For more information about Octoparse, please click here.

Sign up today.



Author's Picks


About Octoparse

A Comparison among Three Editions of Octoparse

Octoparse 6.0 is Now Available

What A Price Monitor Can Help you?

Collect Data from Amazon

Collect Data from eBay

Collect Data from LinkedIn

Collect Data from Gumtree.com





Recent Posts


Leave us a message

Your name*

Your email*




Attach file
Attach file
Please enter details of your issue and we will get back to you ASAP.