What is a task in Octoparse
Updated over a week ago

Everything you do in Octoparse starts with building a task. A scraping task in Octoparse is also referred to as "a bot", "an agent", or "a crawler". Regardless of what it is called, a task is essentially a set of instructions for the program to follow. One task usually scrapes a page or multiple pages with the same page design.

Building a task in Octoparse is straightforward. First, load your target webpage in Octoparse and click to select the data you need to fetch. Once you've finished selecting the data you need, a workflow is auto-generated according to how you've interacted with the webpage, for example, if you've clicked a certain button, hovered on the navigation menu, or if you've clicked to select any data on the page.

Octoparse simulates the real browsing actions as it clicks, searches, paginates, etc, and finally reaches and fetches the target data, all done by following the steps in the workflow. This is how Octoparse works to extract data from any webpage.


Custom Task vs. Task Template

There are two ways to create a scraping task in Octoparse. You can create a task under Custom Task or pick up a Task template right off the bat.

Custom Task

With Custom Task, you'll get to customize your own scraping task in any way you like, such as searching with keywords, logging into your account, clicking through a dropdown, and much more. To put it simply, Custom Task is all you need to scrape data from any website.

Task Template

Contrary to Custom Task, Task Template provides a large number of pre-set scraping templates for some of the most popular websites. These tasks are pre-built so you'll only need to input certain variables, such as the search term and the target page URL, to fetch a pre-defined set of data from the particular website.

Ready to get your hands on some data? Follow the introductory lessons for step-by-step guidance on how to create your first task.

NOTE:

  1. The interface of version7 and version8 is different, the auto-detect feature only comes with version8

  2. You can utilize the auto-detection feature to get the basic workflow first, then modify or optimize it to meet your own needs

  3. Usually to scrape data from one website (or URLs under one domain) will use one task/crawler. Because one task/crawler can only scrape data from pages with a similar page structure. But you can try scraping email addresses from a list of websites by using one crawler, here are the tutorials for your reference: Can I extract email addresses from a series of websites without similarities?


Tips on managing your tasks

1. Task Information Editing

The task name is automatically created as you save the URL entered.

  • To modify the task name, click the textbox above the workflow panel and enter a new name.

  • Or click the edit button to rename a saved task

2. More Actions for Task Management

Quick actions

  • "Duplicate" – Replicate task

  • "Delete" – Delete a task

More actions

  • "Export" – Export task file. The task file can be saved on your device or sent to the support team for troubleshooting.

  • Task ID (API) – ID for the task. It can be used in API calls or sent to the support team for troubleshooting.

  • "Local Run" – Options for Local Run, like Start/Stop, or Schedule

  • "Cloud Run" – Options for Cloud Run, like Start/Stop, Schedule, or Cloud Run History

  • "Move to Group" – Move the task to another group

  • "View Data" – View the Cloud or Local data

  • "More Actions" – Edit, Rename, and Task Settings

To batch manage tasks:

  • Select multiple tasks (It also works for selecting one task).

  • Select the options available here to batch operate

Did this answer your question?