Everything you do in Octoparse starts with building a task. A scraping task in Octoparse is otherwise referred to as “a bot” or “an agent” in the world of scrapers. Regardless of what it is called, a task is essentially a set of instructions for the program to follow.
Building a task in Octoparse is straightforward. You’ll first load your target webpage in Octoparse, and click to select the data you need to fetch. Once you finish selecting the data you need, a workflow will be auto-generated according to how you’ve interacted with the webpage, for example, if you’ve clicked a certain button, hovered on the navigation menu, or if you’ve clicked to select any data on the page.
Octoparse simulates the real browsing actions as it clicks, searches, paginates, etc, and finally reaches and fetches the target data, all done by following the steps in the workflow. This is how Octoparse works to extract data from any webpage.
Advanced Mode vs. Task Templates
There are two ways to create a scraping task in Octoparse. You can create a task under Advanced Mode or pick up a Task template right off the bat.
Advanced mode
With Advanced mode, you’ll get to customize your own scraping task in any way you like, such as searching with keywords, logging into your account, clicking through a dropdown, and much more. Simply put, the Advanced mode has almost everything you need to scrape data from any website.
Task Templates
Contrary to Advanced Mode, Task Templates provides a large number of pre-set scraping templates for some of the most popular websites. These tasks are pre-built so you’ll only need to input certain variables, such as the search term, the target page URL, to fetch a pre-defined set of data from the particular website.
Ready to get your hands on some data? Follow the introductory lessons for step-by-step guidance on how to create your first task.
Note:
- Version 8 comes with a newly designed task edit interface and the auto-detect feature is also exclusive to version 8.
- You can utilize the auto-detection feature to get the basic workflow first, then modify or optimize it to meet your own needs
- Usually to scrape data from one website(or URLs under one domain) will use one task/crawler. Because one task/crawler can only scrape data from pages with a similar page structure. But you can try scraping email addresses from a list of websites by using one crawler, here are the tutorials for your reference: Can I extract email addresses from a series of websites without similarities?
Tips for managing your tasks
1. Task information editing
A task name is automatically created when you save the URL entered.
· To modify the task name, click the textbox above the workflow panel and enter a new name.
2. More actions of task management
Here are more actions of task management you might use.
Options for task management in “More Actions”
· “Edit” – Edit task (Or double-click the task name on the dashboard to edit.)
· “Delete” – Delete task
· “Rename” – Rename task
· “Settings” – Basic settings (including task group and description) and extractions settings
(including cloud task splitting & image loading setting & adblocking; browser user agent switching; incremental cloud extraction)
· “Duplicate” – Replicate task
· “Export” – Export task
To batch manage tasks:
· Select multiple tasks (It also works for selecting one task).
· Select the options available here to batch operate
· To undo the items selected, click “Unselected”