All Collections
Octoparse 101
Lesson 0: Octoparse Basics
Lesson 0: Octoparse Basics
Updated over a week ago

Hi there! Welcome to the brand new Octoparse version 8.5! There are some major updates in this new version so we are putting together a new learning series to help you grasp the new capabilities and improvements in the software.

Going through all the intro lessons will help you get a thorough understanding of Octoparse 8.5. You will be able to scrape data from most web pages with Octoparse. It normally takes around 30 to 60 mins to finish reading all the lessons.


1. The Interface

As soon as you log into Octoparse, you will find two main sections: the home page and the sidebar.

1.1 The Home Screen

There is a search bar at the top of the page where you can enter the target webpage URL(s) to start building a task.

Or you can also enter a template name (such as Amazon or eBay) to search for a pre-built scraping template.

You can also access some of the most popular scraping templates and tutorials on the home page.

There is a support button in the bottom right corner. You can search for a tutorial or start a quick chat with the Octoparse support team for any assistance needed

1.2 The Sidebar Menu

The sidebar menu on the left contains everything you need to navigate within Octoparse.

  • + New: create/import a new task or create new task groups.

  • Dashboard: where you can find all your scraping tasks. You can edit, delete, rename and organize all the tasks in your account. You can also run, stop or schedule any tasks conveniently.

Dashboard.jpg
  • Template: where you can find all the available templates.

1.3 The Workspace

The Octoparse workspace is where you will be building your tasks. There are 5 main parts to it with each part servicing its particular purpose.

2021-09-08_11-00-58.png
  • The Built-in Browser: Once you have entered a target webpage URL, the webpage will be loaded in Octoparse's built-in browser. you can browse the website in Browse mode or click to extract the data you need in Select mode.

  • Tips: Octoparse uses Smart Tips to "talk" to you during the extraction process, to guide you through the task-building process.

  • The Workflow: As you proceed to interact with the webpage, such as opening a web page and clicking on a page element/button, the entire process is defined automatically in the form of a workflow.

  • Settings: Settings options for the actions in the workflow will be shown after you select one action.

  • Data Preview: To have a preview of the selected data. You can also rename the data fields or remove the ones that are not needed.


2. Core Features

Task Templates are pre-built tasks for users to get data by entering simple parameters like URL(s) or keywords. There are currently over 100 templates for most mainstream websites. There is no need to build anything and no technical proficiencies are required. Simply select a template you need, check the sample data to see if it gets what you need, and extract data right away!

2.2 Scraping data with Custom Task

Contrary to task templates where everything's already pre-set, the Octoparse Custom Task is a highly flexible and powerful scraping mode that enables you to build a scraping task customized to your specific requirements. The Custom Task is robust enough to scrape complicated web pages, such as pages with JavaScript, AJAX, or any dynamic websites.

Building your own scraping tasks with Custom Task need not be complicated or intimidating. With the new auto-detect algorithm, Octoparse automatically detects elements on a page and generates recommended task settings like extracting the list and going to the next page.

mceclip1.gif

On top of the auto-detected data, you can always manually edit the task settings or build a task from scratch by skipping the auto-detect step.

mceclip0.gif

Once you are satisfied with the auto-detected data, simply save the settings and Octoparse will generate the task workflow automatically. You can add extra steps to the workflow or modify the actions manually if needed.

Octoparse offers a powerful Cloud platform for premium users (Standard & above) to run your tasks 24/7. When you run a task with "Cloud extraction", it runs in the Cloud with multiple servers using our IPs. You can shut down the App or your computer while the task is running. There is no need to worry about hardware limitations.

Data extracted will be saved in the cloud and can be accessed at any time. Advanced features such as automatic IP rotation, task scheduling, extraction speed up, and Octoparse API are all parts of the Octoparse Cloud service.


Did this answer your question?