Step-by-step tutorials for you to get started with web scraping

Download Octoparse

Retry actions​​

Thursday, August 16, 2018

Retry action is a feature provided in Octoparse for reloading the web page that you want to scrape based on some certain condition.

 

Why set up "Retry"?

When the web page is not loaded normally, Octoparse has problems in scraping the data from the page and even in executing next actions. In this case, Octoparse needs to retry loading the page before starting the extraction.

 

How to set up "Retry"?

Retry setting is only available in 3 page-loading-related actions in the workflow: Go To Web Page, Click Item and Click to paginate.

 

· Tick the "Retry when" box, then click  to configure the condition

Octoparse needs a certain condition to tell whether the page is loaded normally and retry loading the page if the load does fail.

 

 

· Configure "URL/content/element(XPath) contains" option and "Contain/Does not contain" option

Usually when the load fails, the web page would respond to you with a message in the URL/content of the current page to indicate what happens, like "/errors", "500 Internal Server Error" or "Too many requests". Input a certain string like that in the textbox as the condition and select "Contains". Thus, Octoparse would retry loading the page when it detects the string in the URL/content of the current page.

You can also input the XPath of some certain element that would only be there when the page is loaded normally. In this case, you need to select "Does not contain". As a result, once Octoparse does not detect the set XPath on the current page, it would reload the page.

You can click  to add multiple conditions for Octoparse to make the judgment.

 

 

· Set up "Maximum reload times" and interval time

To avoid Octoparse from being stuck in endlessly reloading the web page, you need to set up the maximum times of retrying. When Octoparse reaches the maximum times of retrying, it would stop and enter the next step.

 

 

 

Related articles:

Extract multiple pages through pagination 

Extract data from a list of URLs 

Deal with AJAX 

Locate elements with XPath 

Set up wait time 

 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_use.png
btn_sidebar_form.png