Q: How to get current page URL when scraping in Octoparse?

 

Description:

How to add current page's URL as one of my data fields when making a scraping task in Octoparse?

  

A:

The simplest method:

You can add the current page's URL when you are in the "Extract Data" action:

1. Click the "Add Pre-defined Fields".

 

2. Choose the “Add the current page URL”.

 

3. The current page's URL will be added automatically in the Define Fields. You can rename the data field.

 

Another method:

You can add the current page's URL when you are in the "Extract Data" action:

1. Click anywhere (for example, the blank place) on the web page  ➜ Choose "Extract text", and a data field will be generated automatically  Click "Save".

 

 2. Select the “Customize Field” button ➜ Choose “Define data extracted” ➜ Choose "Extract page URL" under the "Extract data from browser" option. ➜ Click "OK" ➜ Click "Save". Then you will see the current page's URL has been extracted. You can rename the data field if necessary.

 

 

btn_sidebar_use.png
btn_sidebar_form.png