Step-by-step tutorials for you to get started with web scraping

Download Octoparse

Scrape post from LinkedIn

Monday, January 14, 2019

In this tutorial, we will show you how to scrape the posts from LinkedIn.com.

To follow through, you may want to use this URL in the tutorial:

https://www.linkedin.com/search/results/content/?keywords=octoparse&origin=SWITCH_SEARCH_VERTICAL

 

Here are the main steps in this tutorial:  [Download task file here]

   1."Go To Web Page" - to open the targeted web page

  2.Dealing with infinitive scrolling – to get more data from listed page

  3.Create a "Loop Item" -to loop extract each post

  4.Extract data – to select the data you need to scrape

  5.Start data extraction – to run your task and get data

 

 

 

 

1)"Go To Web Page" - to open the targeted web page

  • Click "+ Task" to start a new task with "Advanced Mode"
  • Paste the URL into the "Input URL" box
  • Click "Save URL" to move on

 

This website requires us to log in first, so we need to input our username and password to log in before accessing the data we want. Please check out the details in this tutorial: Extract Data behind a login.

 

 

 Tips!

Advanced Mode is a highly flexible and powerful web scraping mode. For people who want to scrape from websites with complex structures, like Amazon.com, we strongly recommend Advanced Mode to start your data extraction project.

 

 

 

 

2) Dealing with infinitive scrolling

In this case, pagination is not an option for loading content, we will need to scroll down to the bottom of the page continuously to fully load all the contents.

  • Select "Scroll down to bottom of the page when finished loading" under "Advanced Options"
  • Set "Scroll times" and "Internal" you need
  • Select "Scroll down to bottom of the pageas "Scroll way"
  • Click "OK" button to save the result

 

 

 

 Tips!

1. Make sure that you input "Scroll times", otherwise Octoparse wouldn’t perform the scroll down action. We suggest it is better to set a relatively higher value of "Scroll times" if you need more data.

2. Most social media website use scroll-down-to-refresh to view more data, click here to learn more about dealing with infinite scrolling.

 

 

 

 

 

 

 

 

3) Create a "Loop Item" -to loop extract each post

  • Scroll down and select the first post in the built-in browser

We need to make sure the whole block of the first post is covered in blue when you curse over your mouse. Only in this way, we could see the whole post block is highlighted in green after clicking, covering all other information like author, title, content...etc.

  • Click the second whole post

Octoparse will automatically recognize the other blocks and highlight them in green

  • Click " Extract text of the selected element " on "Action Tips" panel.

 

 

 Tips!

Normally we can just click "Select all sub-elements" on the "Action Tips" panel, but under certain circumstances (like this case), Octoparse fails to generate the option. Thus, we can create a loop at first, and select the data of each post for extracting manually in the next step.

  

 

 

 

 

 

4) Extract data - to select data you need to scrape

  • Click on the Data field
  • Click "Delete Data Field"
  • Click "Yes”
  • Click data you need in the first item block to scrape.
  • Click "Extract text of the selected element" and rename the "Field name" column if necessary.

 

 

 

Tips!

How can we check if the XPath of Loop Item is right?

Octoparse will automatically generate the XPath of loop item. Since the layout of this web page is pretty simple, the XPath should be correct. But still, we can confirm that by scrolling down the page to load more content, and then check if the item numbers in the loop is increasing.

As we can see, when we scroll down the page manually, the newly loaded posts can be located successfully into the loop.

 

 

 

 

 

 

5) Start data extraction – to run your task and get data

  • Click "Start Extraction"
  • Select "Local Extraction" to run the task on your computer

 

 

Below is the output sample

 

 

Was this article helpful? Feel free to let us know if you have any question or need our assistance.

Contact us here  !

 

 

Author: Momo

Editor:Suire

 

 

 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_form.png