The updated version of this tutorial (based on the latest webpage) is available now. Go to have a check here!
In many cases, the data we need are just nearby the field name as below.
It is easy to scrape the same field value from multiple websites with Octoparse when the field in different pages is at the same position. But some fields are not always in the same position. For example, in Amazon product detail pages, the field “Average Customer Review” is not always in the same line as shows below. Octoparse may just extract the wrong data as it cannot detect the field is not in the same line. How can we just scrape text nearby a certain field?
Octoparse can easily deal with that by defining the ways to locate an item. What you need to do is to firstly write an Xpath to locate the field name, and then modify the Xpath to precisely locate the text nearby the field name.
In the above example, we can first write the Xpath to locate the field name “Average Customer Review” in firebug using “text contains”: //*[contains(text(),'Average Customer Review')]
Then modify the Xpath to locate the value of the field: //*[contains(text(),'Average Customer Review')]/../div/span/a (The Xpath will be different in cases)
Click the “Customize Field”, choose “Define ways to locate an item”, enter the Xpath in “Matching Xpath” box and click “Save”.