All Collections
Case Tutorial
Jobs
Scrape posts from LinkedIn
Scrape posts from LinkedIn
Updated over a week ago

You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

In this tutorial, we will show you how to scrape posts from LinkedIn.com. To follow through, you may want to use this URL in the tutorial: https://www.linkedin.com/search/results/content/?keywords=google&origin=GLOBAL_SEARCH_HEADER&sid=DIi

The main steps are shown in the menu on the right, and you can download the sample task file here.


1. Create a Go to Web Page - to open the target website

  • Paste the URL and click Start


2. Log in to the website - to access the data

  • Click on the Sign In button and choose Click URL to go to the log-in page

  • After the login page is loaded, click on the Email input box and choose Enter text

  • Click on the password input box

  • Input the LinkedIn email address and password in Textbox1 and Textbox2

  • Click Confirm

  • Click the Sign in button and choose the Click button

  • Set up the AJAX timeout to 10s

AJAX_timeout.jpg

3. Auto-detect webpage - to create the workflow

  • Select Auto-detect web page data

autodetect.jpg
  • Click Create workflow

mceclip2.png
  • Click on Scroll Page, set up Scroll for one screen, Repeat and Wait time

set_up_scroll_repeats.jpg
  • Check the data fields in Data Preview and delete unwanted fields or rename them if needed

    • Delete unnecessary data fields directly by clicking More and Delete field

    • Modify the data field names by double-clicking the headers


4. Modify the XPath of Loop Item - to locate more posts

LinkedIn pages are quite complicated. The auto-generated XPath does not work perfectly. So we need to update the XPath.

  • Click on Loop Item and input the XPath:

    //ul[contains(@class,"reusable-search__entity-result-list")]/li

  • Click Apply


5. Extract the data - to choose the data you want

The auto-detection feature can help us select most of the data we need, but there may still be some data that we need to manually select.

  • Click on the element on the page

  • Choose Text

  • It takes two more steps to extract the post URLs

    • Click on the post title

    • Click "<<" icon

    • Select A

    • Click on link


6. Modify the XPath of data fields - to locate the data precisely

You may need to modify the XPath of some data fields that do not show on the data preview section. By doing so, we can make data scraping more precise. Here are some prepared XPaths for you.

Post_URL: //div[contains(@class,"description-container")]/div/a

Content://div[contains(@class,"feed-shared-update-v2__commentary")]

Comments: //li[contains(@class,"social-details-social-counts__comments")]

  • Switch to Vertical View

  • Paste the XPath one by one


7. Run task - to get the data

  • Click Run to run your task either on your device or in the cloud

  • Select Standard Mode under Run on your device section to run the task on your local device

  • Wait for the task to complete

Note: We do not suggest running the LinkedIn tasks in the Cloud because the website will detect that you are logging in with a suspicious IP.

Here is the sample output data, which can be exported in Excel, CSV, HTML and JSON formats.

Did this answer your question?