undefined
Blog > Octoparse > Post

Use Octoparse to Download Web Data Easily - User Guide

Monday, January 25, 2021

 

About Octoparse

Octoparse is a modern visual web data extraction software. Both experienced and inexperienced users would find it easy to bulk extract information from websites with it. For most scraping tasks, no coding is needed.

Octoparse supports Windows XP, 7, 8, 10. It works well for both static and dynamic websites, including those web pages using Ajax. To export the data, there are various data formats of your choice like CSV, EXCEL, HTML, TXT, and databases (MySQL, SQL Server, and Oracle via API). Octoparse simulates human operation to interact with web pages.

how to download websites data to excel

Video: How to Extract Data from Website to Excel Automatically

 

Its remarkable features such as filling out forms, entering a search term into the textbox, etc., make extracting web data an easy process. You can run your extraction project either on your local machines (Local Extraction) or in the cloud (Cloud Extraction).

Some of our clients use Octoparse’s cloud service, which can extract and store large amounts of data to meet large-scale extraction needs. 

Octoparse free and paid editions share some features in common. Paid editions allows users to extract enormous amounts of data on a 24-7 basis using Octoparse’s cloud service. The prices of each plan can be viewed here.

 

Workflow

 

octoparse interface

Octoparse provides a visual operation pane, which is very user-friendly and straightforward. It simulates human web browsing behavior like opening a web page, logging into an account, entering text, pointing-and-clicking the web element, etc. Just click the information on the website in the built-in browser and start the extraction, and you will get the structured data you need.

There are 2 extraction modes (Task Template and Advanced Mode) in Octoparse. It takes you only half an hour to get started with Octoparse, and people who have programming experience would spend less time to get familiar with Octoparse.

 

Cloud Extraction

Scraping the web on a large scale simultaneously, based on distributed computing, is the most powerful feature of Octoparse. After you upload your scraping project to the cloud, you can choose to perform the extraction concurrently using many cloud servers. If you need to scrape 10,000 web pages within a short time, then Octoparse cloud service fits best. Standard Edition limits you with only 10 cloud servers, though it still greatly speeds up the process of data extraction. You can set up a time schedule for regular data extraction. 

 scrape data at large scale

Video: How to Extract Data From Millions of Web Pages in the Cloud 

 

Advanced Mode

For the Advanced Mode, the tool provides a rich set of tools. These tools include:

            # RegEx Tool#

            # Xpath Tool #

            # Database Auto Export Tool #

            # API #

            ...

To improve users' experience, Octoparse provides the inbuilt RegEx generator. The refining scraped fields might require you to apply RegEx, so this fits it best both generating and verifying RegExes.

 
 

API

The Octoparse API makes it easy to connect your system to numerous data in real-time. You can either import the Octoparse data into your own database or use our API to require access to your own account’s data. Just configure the rule for your task, and Octoparse cloud servers will do the rest. Data are returned as XML.

web api data extraction

Video: How to Extract Data to Your Database via API

 

To use the Octoparse Standard API, you will need to hold a Standard or Professional account with at least one runnable task set up. Documentation: http://dataapi.octoparse.com/help

 

To use the Octoparse Advanced API, you will need to hold a Professional account with at least one runnable task set up. Documentation: http://advancedapi.octoparse.com/help

 

Proxies

Does it ever drive you crazy that your IP address is banned and you cannot access a website because you scrape it frequently? It happens especially when you extract data from business directories that apply strict anti-bot measures. Octoparse enables you to scrape these websites by rotating anonymous HTTP proxy servers. In Cloud Extraction, Octoparse applies lots of 3rd party proxies for automatic IP rotation. For Local Extraction, you can add a list of external proxy addresses manually and configure them for automatic rotation. To do this, you can click here to learn how to include IP rotation into a scraping project.

IPs are rotated with a certain interval of time you set. In this way, you can extract data from the website without taking risks of getting IP addresses banned.

 

Check out this video to know how Octoparse prevents getting blacklisted or blocked when scraping websites.

 

How to Scrape Websites Without Getting Blacklisted or Blocked

Video: How to Scrape Websites Without Getting Blacklisted or Blocked

 

Author: The Octoparse Team 

 

日本語記事:Octoparseとは?
Webスクレイピングについての記事は 公式サイトでも読むことができます。
Artículo en español: Sobre Octoparse
También puede leer artículos de web scraping en el Website Oficial

get web data easily with octoparse

 

More Resources

 

Web Scraping Templates Take Away

Locate Element with XPath

Octoparse Regular Expression Tool (RegEx)

Deal with AJAX

Cloud Extraction: Scrape at Large Scale

Connect Octoparse API Step by Step

 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download
We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline