4. Task Actions in Workflow Designer

 

 Task Actions in Workflow Designer

 

4.1.  Open page

 

Directly enter the URL in the address bar of the built-in browser and click ”Go”, then the “Open a webpage” action will be automatically created. 

Or you can drag an “Open a webpage” action, drop it into Workflow Designer, enter the URL in the “Page URL” textbox and click “Save” to open the target website/webpage.

 

 

. Advanced Options

 

. Action Cation: The name of the action

 

 

. Timeout: Set up the maximum time to load the page.

 

 

. Block Pop-up : Block pop-up windows (Possible ads)

 

 

. Use Loop URL: Use the current loop items as navigation URLs. When you choose this option, you will see a pop-up window saying it’s available only when the current step is the first sub-step of Loop item.

 

 

. Scroll Down: Scroll down to the bottom of the page when finished loading. You can choose scroll down interval time and to scroll to the end of the page/ scroll down for one screen. (Usually used for websites with infinite scrolling.)

 

 

. Cache Settings.

Clear Cache: Choose this option to clear cache before opening the web page.

Customize Cookie: Choose this option to use specified cookie

Cookie: Click “Load cookie from current web page”, then click anywhere in the blank space of “Cookie” textbox. And the cookies of the current web page will be shown in the drop down menu.

. Retry in following conditions: You can retry to load the website in following conditions.

When you open a URL in Octoparse, you can choose to reload the website when one of the following situations appear:

. The URL you entered would redirect to a different URL and the result page is not that you want. In this case, you can enter some strings of the different web page/URL into the “the URL of the result page contains”textbox to reload the website.

. The result web page would contain some content that may prevent you from visiting the site the first time you open it. For example, you’ll be asked to login or register when you first visit certain sites. But you can visit the site normally when you reload it.

In this case, you can enter some content shown in the result page into the “the content of the result page includes” textbox to reload the website.

The result web page doesn’t include the content you want. In this case, just enter the content that should be shown in the result page into the “the content of the result page does not include”textbox to reload the website.

You can choose the maximum reload times from the drop down list and refresh the web page after specific time interval.

 (Note: After modifying Advanced Options, dont forget to click Save button.)

 

 

4.2.  Click Item/Click to paginate

 

Click any web element on the web page in the built-in browser, and

1.choose “Click an item” in the pop-up window. The “Click Item” action will be created in the workflow automatically.

2.Choose “Loop click the element” in the pop-up window. The “Click to paginate” action along with the “Loop” action will be created in the workflow automatically.

Or drag a “Click Item” action and drop it into Workflow Designer to click an element on the webpage.

 

Advanced Options

. Action Caption: The name of the action

 

 

. Wait before execution: Choose the time you wait to wait before executing this action, or execute the action only when the element appears. In this case you can enter the XPath of the element into “ or wait until specific element appears” textbox.

 

. Use Loop: Click Loop items. 

 

 

. New Tab: Open link in new tab by default.

 

 

. Scroll Down: Scroll down to the bottom of the page when finished loading. You can set how many times you want to scroll the web page, the scroll down time interval and the way to scroll (either scroll to the bottom of the page or scroll down one full screen). Usually it’s used for a web page with infinite scrolling.

 

 

. Ajax Load: Choose this option when the website/webpage is loaded with AJAX and set up AJAX timeout.

 

 

. Page Acceleration: Choose this option to optimize non-ajax page.

 

 

. Locate an anchor: Choose this option to relocate the anchor when finished loading. 

 

 

4.3.  Extract Data

Drag an “Extract Data” action and drop it into Workflow Designer to extract data on the webpage. (Note: When you extract data on the webpage, “Extract Data” action will be created.)

 

. Define Fields: The data you choose to extract will be shown in these fields.

You can rename the field and select options from the drop down list if the item is not found on the web page.

. Action Caption: The name of the action

 

. Wait before / Wait until specific element appears: Choose the time you want to wait before executing this action, or execute the action only when the element appears. In this case you can enter the XPath of the element into "or wait until specific element appears” textbox.

 

 

4.4.  Enter Text

Drag an “Enter Text” action and drop it into Workflow Designer to enter text in the textbox on the webpage. (Note: When you input words in the textbox(Search bar/Login textbox) on the webpage by choosing the textbox and selecting “enter text value, “Enter Text” action will be created automatically.)

 

 

. Text to input: Type in the words you want to input in the textbox(Usually search bar/Login textbox) on the webpage.

 

 

. Action Caption: The name of the action

 

 

. Wait before execution/Wait until specific element appears: Choose the time you wait to wait before executing this action, or execute the action only when the element appears. In this case you can enter the XPath of the element into “ or wait until specific element appears” textbox.

 

 

. Loop Text: Loop to enter multiple words in the “Loop Item” to fill the textbox.(Usually used for scraping data by searching multiple keywords on a website.)

 

 

4.5.  Switch Combo box

 

. Action Caption: The name of the action

 

 

. Wait before execution/wait until specific element appears: Choose the time you wait to wait before executing this action, or execute the action only when the element appears. In this case you can enter the XPath of the element into “ or wait until specific element appears” textbox.

 

 

. Use Loop

 

 

. Ajax Load: Choose this option when the website/webpage is loaded with AJAX and set up AJAX timeout.

 

 

4.6.  Loop Item

Drag a “Loop Item” action drop it into Workflow Designer to cycle other actions. (e.g. If you drop a “Click Item” into “Loop Item” action, it will keep doing the “Click Item” action.) (Note: When you “create a list of item” to loop open webpages, “Loop Item” action will be automatically created.)

 

. Loop Item List: Items to be Looped will be listed here. (e.g. Webpages)

 

 

. Action Caption: The name of the action

 

 

. Wait before execution/wait until specific element appears: Choose the time you wait to wait before executing this action, or execute the action only when the element appears. In this case you can enter the XPath of the element into “ or wait until specific element appears” textbox.

 

 

. IFrame

 

 

. Loop Mode: Choose a Loop mode to cycle.

 

 

. List of URL: Loop open multiple webpages by entering multiple URLs.

 

 

. Quit loop when: When you want to quit the loop, open the “Quit loop when” and choose the option “Execution times equals” and enter/choose the number of execution.

 

 

 

4.7.  Branch judgment (If-else)

Drag an “Branch judgment” action and drop it into Workflow Designer to execute different actions based on different conditions. (Note: it will execute the action if a condition is true and execute another action if that condition is false.)

 

. Action Caption: The name of the action

 

 

. Wait before execution/wait until specific element appears: Choose the time you wait to wait before executing this action, or execute the action only when the element appears. In this case you can enter the XPath of the element into “ or wait until specific element appears” textbox.

 

 

4.8.  Cursor Over

Drag a “Cursor Over” action and drop it into Workflow Designer to hover the cursor over a specific element.

 

 

. Item: The specific item hovered over.

 

 

. Action Caption: The name of the action

 

 

. Wait before execution/wait until specific element appears: Choose the time you wait to wait before executing this action, or execute the action only when the element appears. In this case you can enter the XPath of the element into “ or wait until specific element appears” textbox.

 

 

 

. Ajax Load: Choose this option when the website/webpage is loaded with AJAX and set up AJAX timeout.

 

 

4.9. End Loop

Drag an “End Loop” action and drop it into Workflow Designer to stop the “Loop Item” running.

  

4.10.   End Workflow

Drag an “End Workflow” action and drop it into Workflow Designer to stop the rule running. (Note: There’s already an “End Workflow” action in the Workflow Designer, and a “Start” action  as well.)

 

Note:  When you configured a data extraction rule but failed to get the data you want, there must be something wrong with the rule you made. It always happens especially when the site you planed to crawl is very complicated and the data you want is too huge. In this case, you can check the rule by clicking each action in order.

Contact
us

Leave us a message

Your name*

Your email*

Subject*

Description*

Attachment(s)

Attach file
Attach file
Please enter details of your issue and we will get back to you ASAP.