Monday, December 27, 2021

Psst! You are reading a tutorial for Octoparse version 7.3, which is slowly on its way out. We strongly recommend that you update Octoparse to the latest version because the new version is more robust in dealing with complicated websites like Instagram. Also, check the updated tutorial in our new help center.


Continue reading if you still decide to finish your task on version 7.3. In this tutorial, we are going to scrape data from Instagram, including post content, date, image URL, number of likes, and location.

To follow through, you may want to use the following URL:


Also available:


Here are the main steps in this tutorial:[Download demo task file here ]

1) Go To Web Page - to open the target web page

2) Create a pagination loop - to scrape data from multiple posts

3) Extract data - to select the data for extraction

4) Customize the data field using the RegEx tool - to revise the field name (optional)

5) Save and start extraction - to run the task and get data


1) "Go To Web Page" - to open the target web page

· Create the task with "Advanced Mode".

· Paste the URL into the "Extraction URL" box and click "Save URL" to move on

· Change the default built-in browser

The default built-in browser of Octoparse 7 is incompatible with Instagram. To have our target page loaded normally, we need to modify the browser setting.

· Click "Setting"

If you use Octoparse 7.0.2, please have the task saved before modifying the settings

· Switch the default built-in browser to Firefox 45.0.

· Click "Save" to apply the modified setting


2)  Create a pagination loop - to scrape data from multiple posts

We can use the “>” button as the“Next page” button to go to the next post. Before creating the pagination loop, we need to go back to the first post.

· Click the first post and click the "A" tag on the bottom of "Action Tips"

When you select an item with a URL, the selected tag would be "A". Normally there’s no need to modify, as Octoparse automatically identifies tags of selected items. But for this case, we need to revise the tag on the bottom of "Action Tips".

· Select "Click the link"


We have the first post opened now. However, as Instagram loads the content with AJAX, we should set up AJAX Load for the "Click Item" action.

· Uncheck "Auto retry when no response"

· Check "Load the page with AJAX"

· Set up "AJAX Timeout”


Now, we can create the “Pagination”

· Click the ">" button

· Click "Loop click next page" on the "Action Tips"


Instagram uses AJAX on the ">" button, so we need to set up AJAX Load for the "Click to Paginate" action as well.

· Click "Load the page with AJAX" on the "Customize Action"

· Set up "AJAX timeout"



To learn more about dealing with AJAX in Octoparse, please refer to Deal with AJAX .


3)  Extract data - to select the data for extraction

We are now on the second post. When creating a "Loop Item", we should always start with the first item on the first page. In this case, we should go back to the first post.

· Click "Go To Web Page" in the workflow

· Click "Click Item"

Octoparse would open the first post.

· Click the pagination loop in the workflow

By doing this, we can help Octoparse decide the execution order and generate the "Extract data" step at the appropriate position in the workflow.


Now, let’s start extracting data.

· Select the data you want

· Click "extract data" on the "Action Tips"



To learn more about how to adjust workflow, please refer to Getting to know Octoparse .


4) Customize the data field  - to revise the field name(Optional)

· Revise the field name by typing or selecting from the pre-defined options

5) Save and start extraction - to run the task and get data

· Click "Start Extraction"

· Select "Local Extraction" to start execution.


Below is the sample output.




