Step-by-step tutorials for you to get started with web scrapingDownload Octoparse
The latest version for this tutorial is available here. Go to have a check now!
Why I got blank fields?
Once you select an element on the web page in the built-in browser, Octoparse intelligently figures out the specific pattern (via XPath) which represents it in the source code of the page. Based on that pattern, all "similar" elements across multiple pages will be detected and extracted.
By default, if Octoparse cannot find the element of the defined pattern on the page, the field will be left blank.
In what cases Octoparse would fail to find the element of the defined pattern?
The most common cases include:
- Your desired data actually doesn't appear on every page to be extracted.
- Your desired data can be found on every page but not always at the same location.
- Some of your desired data is left out by accident.
- Octoparse starts extraction before your desired data gets loaded to the page.
How to deal with the blank fields?
When you get some blank fields in the extracted result, each of them could actually be induced by a different cause. To locate the exact cause, you would need to inspect the specific page which contains the missing data.
Octoparse provides a shortcut for you to trace the pages. When extracting the data from multiple pages, you can have the URL of each page to be extracted at the same time:
Add predefine fields > Add current page information > Web page URL
Pick out the blank fields in the extracted result, load the corresponding URLs in the browser and then you can figure out how to deal with the failure of our defined pattern.
- Desired data doesn't appear on this page > That's OK.
- Desired data appears at a location different from its counterparts > Need to modify the XPath
- Desired data appears at the same location but is not captured as its counterparts > Need to modify the XPath
- Desired data appears at the same location and is captured successfully > Set up Wait before execution / Try the second run