Tuesday, March 15, 2016
Octoparse is a modern visual web data extraction software. Both experienced and inexperienced users would find it easy to use Octoparse to bulk extract information from websites, for most of scraping tasks no coding needed.
Being a Windows application, Octoparse works well for static and dynamic websites, including those whose web pages are using Ajax. There are various export formats of your choice like CSV, EXCEL, HTML, TXT, and databases (MySQL, SQL Server, and Oracle). Octoparse simulates human operation to interact with web pages.
Its remarkable features such as filling out forms, entering a search term into the textbox, etc., would make it much easier to extract web data. You can run your extraction project either on your own machines (Local Extraction) or in the cloud (Cloud Extraction).
Some of our clients use Octoparse’s cloud service, which can extract and store large amounts of data to meet large-scale extraction needs. Octoparse free edition as well as the paid editions share the same feature. Free edition offers users the only option of gathering small amounts of data from websites. Paid editions provide users to extract enormous amounts of data on a 24-7 basis using Octoparse’s cloud service. The price of Standard Edition subscription is $89/month, $890/year, limited with 10 threads though while the Professional Edition subscription would cost $189/month ($1890/year).
Octoparse provides a visual operation pane, which is very user friendly and straightforward. Octoparse simulates human web browsing behavior like opening a web page, logging into an account, entering a text, pointing-and-clicking the web element, etc. Just click the information on the website in the built-in browser and perform the extraction, you will get the structured data you need.
Two modes (Wizard Mode and Advanced Mode) are the most outstanding features of Octoparse. It takes you only half an hour to get started with Octoparse, and people who have programming experience would spend less time to get familiar with Octoparse.
Scraping the web on a large scale simultaneously, based on distributed computing, is the most powerful feature of Octoparse. After you upload your configuration project to the cloud, you can choose to perform the extraction concurrently by using many cloud servers. If you need to scrape 10,000 web pages within a short time, then Octoparse cloud service fits best. Standard Edition limits you with only 10 cloud servers, that still greatly speeds up the process of data extraction. Extraction scheduling also allows to export scraped data.
For the advanced scrape the tool provides rich set of tool. These tools include:
# Regex #
# X path editing #
# Execution timeouts setting #
# Scrolling down #
# Page anchor hook #
To improve users' experience, Octoparse provides the inbuilt RegEx generator. The refining scraped fields might require you to apply regex, so this fits it best both generating and verifying RegExes.
As you start project, do not forget start a task as an advanced one to have all these features avail:
The Octoparse API makes it easy to connect your system to numerous data in real time. You can either import the Octoparse data into your own database, or use our API to require access to your own account’s data. Just configure the rule for your task, and Octoparse cloud servers will do the rest. Data are returned as XML.
Does it ever drive you crazy that your IP address has been banned and you cannot access a website if you scrape a website frequently? It always happens especially when you extract data from business directories that apply strict anti-bot measures. Octoparse enables you to scrape these websites by rotating anonymous HTTP proxy servers. In the Cloud Extraction mode, Octoparse applies lots of 3rd party proxies for automatic IP rotation. But for Local Extraction, you have to add a list of external proxy addresses manually and configure them for automatic rotation. To do this, you can click here to learn how to include IP rotation into scraping project.
IPs are rotated with a certain interval of time you set. In this way, you can extract data from the website without taking risks to get IP address banned.
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today.
Most popular posts
- Related articles
- What's New in Octoparse 7.1
- About Octoparse
- Three Kinds of Analytical Modes to Extraction...
- Data Harvesting Is Solving These Two Problems
- The 1st Year at Octoparse: 300% Growth, A Pro...