You are browsing a tutorial guide for Octoparse's latest version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!

In some cases, local extraction works perfectly, but some blank fields are extracted in Cloud Extraction. This tutorial will introduce the causes of this issue and how to solve it.

1. Tasks executed with cloud extraction are split-table and working too fast hence some elements can be skipped.

Tasks with "Fixed List," "List of URLs" and "Text List" loop mode are splittable. The main tasks will be split into sub-tasks executed with multiple cloud nodes simultaneously. So in this case, every step of the task will work very fast hence some pages may not be loaded completely before moving to the next step.

To ensure the web page is loaded completely in the cloud, you can try to:

Increase the timeout for Go To Web Page step

Set up Wait before action

All steps created in the workflow are able to set up a waiting time. We suggest that you set the wait time for the Extract Data actions.

Set up an anchor element to find before action

This step will guarantee the extraction only starts after a certain element has been found. You can choose any element's XPath from the desired fields.

First, click on the Extract Data step. Second, fill the element with an XPath and change Wait before action to 30s.

Tip: How to get the XPath of a certain element on the page?

Click the Extract Data
Switch to the vertical view, and you will see all the Xpaths for each field

2. The website you are after is multi-regional

A multi-regional website could have different page structures for the content provided to visitors from different countries. When a task is set to run in the cloud, it is executed with our IPs based in America by default. In this case, for tasks targeted by websites outside America, some data may be skipped as it can’t be found on the website opened in the cloud.

To identify if the website is multi-regional, you can:

Test the task with local extraction. If no data is missing as it does on the cloud extraction, then the website is most likely multi-regional. In this case, as the targeted content can only be found when opening the website with your own IP, we suggest Local Extraction to get the data or use the Octoparse proxy to access the website with IPs from a specific country.

Check the Cloud log screenshot to see if the web page is loaded well. You can observe if the page looks the same as it on your device.

Here is a related tutorial for checking errors in the Cloud: Why does the task get no data in the Cloud but work well when running in the local?

Scrape product info from Myntra (Jabong)