Step-by-step tutorials for you to get started with web scraping

Download Octoparse

Locate elements with XPath

Wednesday, November 24, 2021

The latest version for this tutorial is available here. Go to have a check now!


What is XPath? How does it work in Octoparse?

XPath is a language that allows you to locate specific elements from a page. Modifying XPath in Octoparse works very well with more flexibility and accuracy than the XPath auto-generated by clicking elements during the task configuration.

Octoparse allows you to modify XPath so that we can precisely locate the data we are going to scrape. If you would like to learn more about XPath, here’s the tutorial for your reference: https://www.w3schools.com/xml/xpath_intro.asp


When should I use XPath?

In most cases, you don’t need to write the XPath on your own. But there are some situations where you might have to do some modification for better locating the data on the webpage.

(This is our advanced tutorials. Before using the XPath, we suggest you learn a little and get more familiar with Octoparse.)

  • Extract data in irregular location
  • Extra data or missing data
  • Pagination without "Next" button 
  • "Next" button cannot be located precisely.
  • Drop-down menu without switch loop


Where can I modify XPath in Octoparse?

To modify XPath in Octoparse:

Select the data field that needs to be modified, select customize data field


Select "Customize XPath.


Enter the new XPath in Matching XPath textbox


For steps like "Loop Item" for pagination or switching drop-down, you can easily find the XPath textbox under "Advanced Options". Enter the new XPath and click "OK" to save your changes.


How to write XPath?

If you are new to XPath, you might need to grab some basics of HTML first. XPath locates elements based on the tags and attributes. So before you get down to write your own XPath, you would need to inspect the HTML structure of the page first. 

(More tutorials about HTML )


We suggest you use firebug plugin (a Firefox plugin). Firebug is very useful for looking up the element of an HTML document.

(Firebug is now only available for old versions of Firebox. Get the old versions of Firebox here .)


Open a webpage in Firefox, click Firebug button and click an element in the page to inspect. It will bring out all of the XPath.


Octoparse also provides extra help with XPath generation – XPath tool  . You use Octoparse XPath tool to easily generate a working XPath syntax by setting up the appropriate criteria. You can easily find the XPath tool in "Tools" box.


Common XPath expressions used in Octoparse

In this tutorial, we will go through some basics and common XPath used in Octoparse.




Selects the current node


Select all elements


Selects elements starting from the current node


Selects attributes


Selects all <div> elements one or more levels deep in the current context


Selects the <li> elements which enclose an <a> element

//li[a or h2]

Selects the li elements which enclose either an <a> or an <h2> element.


Selects only the <div> elements which has an class attribute that is “publish-time”


Selects all text that is “Next”

//a[contains(text(), ‘Next’)]

Selects the <a> elements which contains “Next” text

.//*[contains(@class, 'name')]

Selects all the <class> attributes that contain “name” string


Selects all siblings after the current node


Select the first <p> element after <h1> 


XPath is very powerful and this tutorial is just an introduction to the basic concepts.



If you want to learn more about it, check out these resources:





Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download
We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline