undefined
Blog > Post

What is a task in Octoparse?

Tuesday, January 18, 2022

Everything you do in Octoparse starts with building a task. A scraping task in Octoparse is otherwise referred to as "a bot" or "an agent" in the world of scrapers. Regardless of what it is called, a task is essentially a set of instructions for the program to follow. 

Building a task in Octoparse is straightforward. You'll first load your target webpage in Octoparse, and click to select the data you need to fetch. Once you finish selecting the data you need, a workflow will be auto-generated according to how you’ve interacted with the webpage, for example, if you've clicked a certain button, hovered on the navigation menu, or if you've clicked to select any data on the page. 

Octoparse simulates the real browsing actions as it clicks, searches, paginates, etc, and finally reaches and fetches the target data, all done by following the steps in the workflow. This is how Octoparse works to extract data from any webpage. 

 

Advanced Mode  vs.  Task Templates

There are two ways to create a scraping task in Octoparse. You can create a task under Advanced Mode or pick up a Task template right off the bat.

Advanced mode

With Advanced mode, you'll get to customize your own scraping task in any way you like, such as searching with keywords, logging into your account, clicking through a dropdown, and much more. Simply put, the Advanced mode has almost everything you need to scrape data from any website. 

Task Templates

Contrary to Advanced Mode, Task Templates provides a large number of pre-set scraping templates for some of the most popular websites. These tasks are pre-built so you'll only need to input certain variables, such as the search term, the target page URL, to fetch a pre-defined set of data from the particular website. 

 

Ready to get your hands on some data? Follow the introductory lessons for step-by-step guidance on how to create your first task.  

 

Note:

  1. Version 8 comes with a newly designed task edit interface and the auto-detect feature is also exclusive to version 8.
  2. You can utilize the auto-detection feature to get the basic workflow first, then modify or optimize it to meet your own needs
  3. Usually to scrape data from one website(or URLs under one domain) will use one task/crawler. Because one task/crawler can only scrape data from pages with a similar page structure. But you can try scraping email addresses from a list of websites by using one crawler, here are the tutorials for your reference: Can I extract email addresses from a series of websites without similarities? 

 

Tips for managing your tasks

  1. Task information editing

 A task name is automatically created when you save the URL entered.

   · To modify the task name, click the textbox above the workflow panel and enter a new name.

  · Or click  to edit the name of a saved task

 

2. More actions of task management

Here are more actions of task management you might use.

Options for task management in "More Actions"

      · "Edit" – Edit task (Or double-click the task name on the dashboard to edit.)

      · "Delete" – Delete task 

      · "Rename" – Rename task

      · "Settings" – Basic settings (including task group and description) and extractions settings

       (including cloud task splitting & image loading setting & adblocking; browser user agent switching; incremental cloud extraction)

      · "Duplicate" – Replicate task

      · "Export" – Export task

 

To batch manage tasks:

      · Select multiple tasks (It also works for selecting one task).

      · Select the options available here to batch operate 

      · To undo the items selected, click "Unselected"

 

 Happy Data Hunting!

 

Author: The Octoparse Team

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today. 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline