Step-by-step tutorials for you to get started with web scraping

Download Octoparse

What is a task

Thursday, August 16, 2018

To start your own data scraping project in Octoparse, you would have to create a task to crawl and extract data required.

Tasks in Octoparse means crawlers for scraping usually a website/couples of categories without limitation of pages/URLs crawling.

Octoparse simulates real browsing experience as it clicks, searches, paginates, etc. An Octoparse task you configure determines which URL to be open, how many pages to be retrieved, what data you need to collect…

This tutorial covers: 

1) "Advanced Mode" Task/ "Wizard Mode" Task

2) Workflow

3) Task management

 

 

 

 

 

 

 

1) "Advanced Mode" Task/ "Wizard Mode" Task

By creating a task, click on "+ Task" button of "Advanced Mode" or "Wizard Mode".

We strongly recommend "Advanced Mode" to start your data extraction project. "Advanced Mode" offers more flexibility and allows you to easily handle complex web scraping cases such as keywords searching, login authentication, opening dropdowns…

 

"Wizard Mode" allows you to create very simple tasks with step-by-step guidance. Tasks created by "Wizard Mode" can be edited with "Advanced Mode".

 

 

There are 3 different kinds of wizards(extraction types) supported by Octoparse "Wizard Mode":

      · List or Table

      · List and Details

      · Single Page

 

 

 

 

2) Workflow

The most critical part of a task is the workflow for your specific data extraction requirements. Octoparse executes every step configured in the workflow to complete your data collection. In Octoparse 7.X version, we’ve added an on-and-off button for users to switch between "Select Mode" and "Workflow Mode".

 

 

Tips!

For better task accuracy, we strongly suggest you turn on the “Workflow Mode” that gives you a better picture of what you are doing with your task, just in case you mess up the steps.

 

 

 

3) Task management

    1. Task information editing

In Octoparse 7.X version, task name is automatically created as you save the URL entered.

      · To modify the task name, click the textbox above the workflow panel and enter a new name.

      · Or click  to edit the name of a saved task

 

      · Click the below to edit the task description

 

 

 

    2. Task import/export

Click on  button to import a task saved anywhere on your computer.

To export a specific task:

      · Select "More Actions" button

      · Select "Task"

      · Select "Export"

 

To batch export tasks:

      · Select multiple tasks (It also works for selecting one task).

      · Select "Manage selected task"

      · Select "Export"

 

    3. More actions of task management

Here are more actions of task management you might use.

Options for task management in "More Actions"

      · "Edit" – Edit task (Or double-click the task name on the dashboard to edit.)

      · "Delete" – Delete task (To batch delete, select multiple tasks and select "Delete" in "Manage selected task")

      · "Rename" – Rename task

      · "Edit with Advanced Mode" - Edit tasks with "Advanced Mode" (Only for tasks created by using "Wizard Mode")

      · "Settings" – Basic settings (including task group and description) and extractions settings

       (including cloud task splitting & image loading setting & ad blocking; browser-user gent switching; incremental cloud extraction)

      · "Copy" – Replicate task

      · "Export" – Export task

 

Related Articles:

What is concurrent run? 

Octoparse Advanced Mode 

Octoparse Wizard Mode 

Octoparse Cloud Extraction 

Octoparse Local Extraction 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_use.png
btn_sidebar_form.png