All Collections
Glossary
What is Custom Task?
What is Custom Task?
Updated over a week ago

Note: Advanced Mode has been renamed as Custom Task starting from Octoparse version 8.5.4

Custom Task enables anyone to scrape data from any website using simple point-and-click, with no code. If you are looking to scrape from webpages that are a bit more complicated or if you have not yet extracted the data successfully using auto-detect, we strongly recommend that you give Custom Task a try and truly uncover the world of possibilities with it:

  • Scrape information from nearly any web pages

  • Extract data like text, URL, image, and HTML

  • Interact with webpages to perform complicated actions such as login authentication, keyword searching, and switching through a drop-down menu

  • Fine-tune your workflow, such as adding wait time, modifying XPath, and reformating the data extracted


Start a task in Custom Task

There are two ways to quickly start a new task using Custom Task:

1) Head straight to the home page, enter the URL(s) of the target web page and hit Start.

2) Right under the Octoparse logo, hover on + New and select Custom Task.


Get to know the Custom Task interface

2.png
  • Built-in Browser: Once you've entered a target webpage URL, the webpage will be loaded in Octoparse's built-in browser. you can browse the website in Browse mode or you can click to extract the data you need in Select mode.

  • Workflow: As you proceed to interact with the webpage, such as opening a web page and clicking on a page element/button, the entire process is defined automatically in the form of a workflow.

  • Tips panel: Octoparse uses smart Tips to "talk" to you during the extraction process, to guide you through the task-building process.

  • Data Preview: Have a preview of the data selected. You can also rename the data fields or remove the ones that are not needed.


How to use Custom Task to build tasks manually

To build a task manually using Custom Task, simply click on the target data on the webpage. Follow the tips provided on the Tips panel to proceed with the task-building process. The general building steps are straightforward:

Select the data you need on the webpage >> Follow through the instructions provided in Tips panel >> Check your workflow >> Run the task to get data

In light of the nature of the web, web pages change all the time, and different sets of data may be needed by different individuals. The Custom Task is created with the flexibility and versatility required to handle all kinds of scraping needs while making sure it is still non-coder friendly with step-by-step guidance provided in Action Tips.


1. Select your target data on the web page

Within the built-in browser, use simple clicks to select any data you'd like to extract from the webpage. As you hover over the web page, Octoparse tries to "understand" what you'd like to fetch as it highlights the page elements around your cursor. You can move your cursor slightly if the highlighted area is not quite close to what you'd like to extract.

Once you have the data you need to be highlighted in blue, you can click to confirm the selection. Now, the selected page element should be highlighted in green, indicating that's been selected successfully.

Repeat the same process if you'd like to extract multiple elements on the same page.

2. Follow through the instructions provided in Tips panel

Octoparse attempts to guide you through the task-building process by offering all possible next steps in the Action Tips Panel. It is a way for Octoparse to "talk" to you.

Every time you select an element, the Action Tips panel will pop up with a number of options for you to choose from. Simply follow through with the instructions provided and choose how you'd like to proceed with the selected data. For example, if you'd like to scrape the text of the selected elements, you can choose Text; or If you'd like to click on the selected element to go to the linked page, you can choose "Click element".

Below are the most frequently used actions:

  • Text - Capture the text of the selected page element

  • Click element - click the selected page element

  • InnerHtml & OuterHtml - capture the source code string of the selected element

  • Loop click - click the selected element repeatedly (similar to Loop click next page)

  • Link - capture the URL of the selected link (when a link is selected)

  • Image URL - capture the image URL (when an image is selected)

Tips:

  • In instances where a target element is difficult to pinpoint with the cursor, you can use the HTML tags located at the bottom of the Tips panel to refine the selection.

  • The Expand the selection button

    at the end can be used to expand the current selection to include the outer HTML tag. For example, if you'd like to extract the entire part surrounding the selected element, you can keep clicking on the expand button until the entire part gets highlighted in green.

3. Check the workflow

As you go on to build the scraping task, Octoparse simultaneously creates a workflow according to how you've interacted with the web page as well as the Tips Panel.

An example workflow:

Tip: Check out this tutorial to learn more about how to test your workflow step-by-step: Lesson 4: Test-run the task

4. Run the task

Now that you've finished building and testing your task, you can run the task by clicking the Run button. You can run the task on your device or run it in the Cloud.

Did this answer your question?