Step-by-step tutorials for you to get started with web scraping

Download Octoparse

Scrape data on Instagram

Friday, September 21, 2018

In the context of present celebrity culture, the internet celebrity is part of our economic life, booming the growth of online retailing, as well as promoting popularity by social media advertising. Providing we need to choose an internet celebrity from Instagram to outreach our product in a marketing activity, how to verify the stability of browsing volume, the blog style, the frequency of blog posting?

The best way could be using the web scraping tool, Octoparse to scrape the data we want from a blogebrity in the Instagram and analysis it bases on our request.

In this tutorial, we use page https://www.instagram.com/izkiz/ as an example to show how to scrape data from Instagram.

 

Here are the main steps in this tutorial:[Download demo task file here ]

1) "Go To Web Page" - to open the targeted web page

2) Create a pagination loop - to scrape all the results from multiple pages

3)  Extract data - to select the data for extraction

4) Customize the data field using RegEx tool - to reformat star-rating data (Optional)

5) Save and start extraction - to run the task and get data

 

 

 

 

 

1) "Go To Web Page" - to open the targeted web page

 · Create the task with "Advanced Mode".

 · Paste the URL into the "Extraction URL" box and click "Save URL" to move on

 · Providing the content of the webpage is unable to be loaded, change the Brower via "setting function".

 

 

 

 

 

2)  Create a pagination loop - to scrape all the results from multiple pages

 · Click the first picture and revise its tab name from "UL" to "A" on the bottom of "Action Tips" Penal(Click Here  to learn Select and extract data/URL/image/HTML detailly)

 · Click "Click the link" on the "Action Tips" Penal

 · Click "Load the page with AJAX" on the "Customize" panel, and set the "AJAX timeout"

 · Click ">" and then "Loop click next page" 

 · Click "Load the page with AJAX" on the "Customize" panel, and set the "AJAX timeout"

 

 

Tips!

AJAX, short for Asynchronous JavaScript and XML, is a set of web development techniques that allows a web page to update portions of contents without having to refresh the page.

With AJAX, the new contents are updated without reloading, which will result in Octoparse is unable to receive the signal to the next step and stuck still. To learn more about AJAX, click here 

 

 

 

 

3)  Extract data - to select the data for extraction

 · Click "Go to Web Page"

 · Select the data you want and click "extract data" on the "Action Tips" Penal

 

 

Tips!

Octoparse can only extract the URL of the selected image from a website. Sometimes, it is necessary to adjust the "tab name" on the bottom of "Action Tips" penal and get the correct information we need.

 

 

 

 

 

 

 

4) Customize the data field  - to revise the field name(Optional)

The Workflow execution order is from the top down, and actions wrapped in Loop Item would be executed for multiple times. Since we need to scrape data from each page, we need to adjust the step.

 · Drag "Extract data" step into the "pagination loop"

 · Revise the field name by typing the words into the blank directly.

 

 

 

 

 

 

 

5) Save and start extraction - to run the task and get data

 · Click "Start Extraction" and select "Local Extraction" to start execution. Data will be automatically extracted by Octoparse.

 · When the task is completed, you can export the data extracted for further analysis.

 

 

Was this article helpful? Contact us  any time if you need our help!

 

 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_use.png
btn_sidebar_form.png