Step-by-step tutorials for you to get started with web scraping

Download Octoparse

Wizard Mode

Thursday, August 16, 2018

What’s Wizard Mode?

Wizard Mode is a simple way to scrape based on a number of pre-built templates. It can be especially useful for anyone new to web scraping. With its built-in wizards/templates, you will be guided step-by-step for setting up the scraping task per your specific requirements. Wizard Mode aims to make scraping easier and faster by pre-defining the general scraping processes for a few common web structures. As for websites with more complex structures, like those requiring login or search with keywords, it is recommended to use Advanced Mode that allows you to configure the workflow with more flexibility. 

 

In this tutorial, we will show you how to apply the 3 extraction types in Wizard Mode to scrape web data easily.

1) Scrape from "List or Table" - extract a list or table from a single page or multiple pages

2) Scrape from "List and Detail"-extract information from item page by clicking on the links on a list

3) Scrape from "Single Page"-extract data from a single web page

 

 

 

1) Scrape from "List or Table"-extract a list or table from a single page or multiple pages

1. Create a task in Wizard Mode

  · Click on "+ Task" 

 

  · Enter the URL and click on "Next" 

 

2. Select extraction type

  · Select "List or Table", then click "Next" 

    

Now you've selected the type of extraction, Octoparse will proceed to further define each step of the workflow. The overall progress can be viewed on the top right side of the interface.

 

3. Define list: specify the list containing the target data

  · Click an item on the list, then click another one from the same list. Octoparse identifies all items automatically and adds them to the text box. 

  · Click "Next" to proceed to the next step in the process: Define field

 

Tips!

When selecting an item on the list, it is important to always make sure all the data fields desired are selected/highlighted. In this example, we intend to extract 3 data fields from each item.

 

4. Define field: specify which data fields to capture

  · Click the target data, then it will be shown in "Data field"

  · Edit the field name

  · Click "Next" to enter into the next step: Pagination

 

5. Pagination: tell Octoparse if you need to scrape from a single page or multiple pages

With Wizard Mode, pagination is disabled by default. If you are scraping data off a single page, click "Next" to continue.

If you need to scrape from multiple pages, select "Enable pagination", then, define the "Next page" button by clicking on it. 

Now click "Next" in the navigation bar to proceed to the next step.

 

6. Complete

Task configuration is now completed, you can run the task by Local Extraction  or Cloud Extraction .

 

 

 

2) Scrape from "List and Detail"-extract information from item page by clicking on the links on a list

1. Create a task in Wizard Mode

  · Click on "+ Task" 

 

  · Enter the URL and click "Next" 

2. Select extraction type

  · Select "List and Detail", then click "Next" 

     

Now Octoparse will proceed to further define each step in the workflow with specific content. The overall progress can be viewed on the top right of the interface.

 

3. Define list: specify the list of items that can enter into the detail page

  · Click an item on the list, then click another one on the same list. Octoparse identifies all items automatically and adds them to the text box.

  · Click "Next" to enter into the next step in the process: Pagination

 

4. Pagination: tell Octoparse if you need to scrape from a single page or multiple pages.

  · Single page: Octoparse disables pagination by default in Wizard Mode. So you can click "Next" to continue.

 

  · Multiple pages: Select "Enable pagination", then define the "Next page" button by clicking on it.  

   Now you can click on "Next" in the navigation bar to proceed to the next step.

           

 

5. Define field: specify the data field to extract

Unlike scraping directly from "List or Table", Octoparse will click into each link on the list and take you to the detailed page.

  · Click the target data, then it will be shown in "Data field"

  · Edit the field name

  · Click "Next" to complete the process

 

6. Complete:

The task is now completed, you can run it by Local Extraction  or Cloud Extraction .

 

 

 

3) Scrape from "Single Page"-extract data from a single web page

1. Create a task in Wizard Mode

  · Click "+ Task" 

 

  · Enter the URL and click "Next" 

   

2. Select extraction type

  · Select "Single Page", then click "Next" 

3. Define field:

  · Select the target data 

  · Edit the field name

  · Click "Next" 

The task configuration is now accomplished,  you can run the task on Local Extraction  or Cloud Extraction .

 

 

Tips!

1. Can you extract data types other than text with Wizard Mode? 

  · Yes. You can select the data types to capture by clicking on the drop-down list for "Data type".

        

  · Usually, the data can be extracted in the form of text, inner HTML, and outer HTML.

       

  · For pictures, Wizard Mode also supports scraping "src" of it.

       

 

2. Can you modify XPath or re-format data with Wizard Mode?

The answer is no. Wizard Mode does not support data re-format or XPath modification. If you need to modify XPath to improve the accuracy of the capture or to re-format data, please switch to Advanced Mode.

Learn more about locating elements with XPath  and reformatting data extracted .

 

3. How to switch to Advanced Mode?

There are two approaches for switching to Advanced Mode.

  · After completing the workflow and before running the task, you can switch to Advanced Mode by clicking on "Edit with Advanced Mode".

  

 

  · On the Dashboard, tasks created using Wizard Mode are noted with   and tasks created using Advanced Mode are noted with .      

For switching to Advanced Mode, click "More Actions" at the right end of the task, select "Task" and you can see the option of "Edit with Advanced Mode".

   

 

 

 

Related articles:

Advanced Mode 

Locate elements with XPath 

Reformat data extracted 

Local extraction 

Cloud extraction 

Build tasks with Octoparse 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_use.png
btn_sidebar_form.png