Step-by-step tutorials for you to get started with web scrapingDownload Octoparse
What is a taskWednesday, November 24, 2021
The latest version for this tutorial is available here. Go to have a check now!
To start your own data scraping project in Octoparse, you would have to create a task to crawl and extract data required.
Tasks in Octoparse means crawlers for scraping usually a website/couples of categories without limitation of pages/URLs crawling.
Octoparse simulates real browsing experience as it clicks, searches, paginates, etc. An Octoparse task you configure determines which URL to be open, how many pages to be retrieved, what data you need to collect…
This tutorial covers:
1) "Advanced Mode" Task/ "Wizard Mode" Task
By creating a task, click on "+ Task" button of "Advanced Mode" or "Wizard Mode".
We strongly recommend "Advanced Mode" to start your data extraction project. "Advanced Mode" offers more flexibility and allows you to easily handle complex web scraping cases such as keywords searching, login authentication, opening dropdowns…
"Wizard Mode" allows you to create very simple tasks with step-by-step guidance. Tasks created by "Wizard Mode" can be edited with "Advanced Mode".
There are 3 different kinds of wizards(extraction types) supported by Octoparse "Wizard Mode":
· List or Table
· List and Details
· Single Page
The most critical part of a task is the workflow for your specific data extraction requirements. Octoparse executes every step configured in the workflow to complete your data collection. In Octoparse 7.X version, we’ve added an on-and-off button for users to switch between "Select Mode" and "Workflow Mode".
For better task accuracy, we strongly suggest you turn on the “Workflow Mode” that gives you a better picture of what you are doing with your task, just in case you mess up the steps.
3) Task management
1. Task information editing
In Octoparse 7.X version, task name is automatically created as you save the URL entered.
· To modify the task name, click the textbox above the workflow panel and enter a new name.
· Or click to edit the name of a saved task
· Click the below to edit the task description
2. Task import/export
Click on button to import a task saved anywhere on your computer.
To export a specific task:
· Select "More Actions" button
· Select "Task"
· Select "Export"
To batch export tasks:
· Select multiple tasks (It also works for selecting one task).
· Select "Manage selected task"
· Select "Export"
3. More actions of task management
Here are more actions of task management you might use.
Options for task management in "More Actions"
· "Edit" – Edit task (Or double-click the task name on the dashboard to edit.)
· "Delete" – Delete task (To batch delete, select multiple tasks and select "Delete" in "Manage selected task")
· "Rename" – Rename task
· "Edit with Advanced Mode" - Edit tasks with "Advanced Mode" (Only for tasks created by using "Wizard Mode")
· "Settings" – Basic settings (including task group and description) and extractions settings
(including cloud task splitting & image loading setting & ad blocking; browser-user gent switching; incremental cloud extraction)
· "Copy" – Replicate task
· "Export" – Export task
- Most popular tutorials
- Scrape tweets from Twitter
- Extract data from a list of URLs
- Extract multiple pages through pagination
- Scrape data on Instagram
- How to download images from a list of URLs?