undefined
Blog > Octoparse > Post

About Octoparse

Friday, April 10, 2020

 (Updated 2020/2/14)

About Octoparse

Octoparse is a modern visual web data extraction software. Both experienced and inexperienced users would find it easy to bulk extract information from websites with it. For most scraping tasks, no coding is needed.

Octoparse supports Windows XP, 7, 8, 10. It works well for both static and dynamic websites, including those web pages using Ajax. To export the data, there are various data formats of your choice like CSV, EXCEL, HTML, TXT, and databases (MySQL, SQL Server, and Oracle via API). Octoparse simulates human operation to interact with web pages.

Its remarkable features such as filling out forms, entering a search term into the textbox, etc., make extracting web data an easy process. You can run your extraction project either on your local machines (Local Extraction) or in the cloud (Cloud Extraction).

Some of our clients use Octoparse’s cloud service, which can extract and store large amounts of data to meet large-scale extraction needs. 

Octoparse free and paid editions share some features in common. Paid editions allows users to extract enormous amounts of data on a 24-7 basis using Octoparse’s cloud service. The prices of each plan can be viewed here.

 

Workflow

octoparse

Octoparse provides a visual operation pane, which is very user-friendly and straightforward. It simulates human web browsing behavior like opening a web page, logging into an account, entering text, pointing-and-clicking the web element, etc. Just click the information on the website in the built-in browser and start the extraction, and you will get the structured data you need.

There are 2 extraction modes (Task Template and Advanced Mode) in Octoparse. It takes you only half an hour to get started with Octoparse, and people who have programming experience would spend less time to get familiar with Octoparse.

 

Cloud Extraction

Scraping the web on a large scale simultaneously, based on distributed computing, is the most powerful feature of Octoparse. After you upload your scraping project to the cloud, you can choose to perform the extraction concurrently using many cloud servers. If you need to scrape 10,000 web pages within a short time, then Octoparse cloud service fits best. Standard Edition limits you with only 10 cloud servers, though it still greatly speeds up the process of data extraction. You can set up a time schedule for regular data extraction. 

 

 

Advanced Mode

For the Advanced Mode, the tool provides a rich set of tools. These tools include:

            # RegEx Tool#

            # Xpath Tool #

            # Database Auto Export Tool #

            # API #

            ...

octoparse tools regex xpath api auto export

To improve users' experience, Octoparse provides the inbuilt RegEx generator. The refining scraped fields might require you to apply RegEx, so this fits it best both generating and verifying RegExes.

 

regex 

API

The Octoparse API makes it easy to connect your system to numerous data in real-time. You can either import the Octoparse data into your own database or use our API to require access to your own account’s data. Just configure the rule for your task, and Octoparse cloud servers will do the rest. Data are returned as XML.

 

To use the Octoparse Standard API, you will need to hold a Standard or Professional account with at least one runnable task set up. Documentation: http://dataapi.octoparse.com/help

 

To use the Octoparse Advanced API, you will need to hold a Professional account with at least one runnable task set up. Documentation: http://advancedapi.octoparse.com/help

 

Proxies

Does it ever drive you crazy that your IP address is banned and you cannot access a website because you scrape it frequently? It happens especially when you extract data from business directories that apply strict anti-bot measures. Octoparse enables you to scrape these websites by rotating anonymous HTTP proxy servers. In Cloud Extraction, Octoparse applies lots of 3rd party proxies for automatic IP rotation. For Local Extraction, you can add a list of external proxy addresses manually and configure them for automatic rotation. To do this, you can click here to learn how to include IP rotation into a scraping project.

IPs are rotated with a certain interval of time you set. In this way, you can extract data from the website without taking risks of getting IP addresses banned.

 

Check out this video to know how Octoparse prevents getting blacklisted or blocked when scraping websites.

 

Author: The Octoparse Team 

 

日本語記事:Octoparseとは?
Webスクレイピングについての記事は 公式サイトでも読むことができます。
Artículo en español: Sobre Octoparse
También puede leer artículos de web scraping en el Website Oficial

contact Octoparse

More Resources

 

Web Scraping Templates Take Away

Locate Element with XPath

Octoparse Regular Expression Tool (RegEx)

Deal with AJAX

Cloud Extraction: Scrape at Large Scale

Connect Octoparse API Step by Step

 

 

 

 

 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download