Extract Information from LinkedIn Public Data 2Monday, June 27, 2016 6:35 AM
(Download my extraction task of this tutorial HERE just in case you need it.)
Welcome to Octoparse’s tutorial.
If you signed in your LinkedIn account before, open the page in the built in browser directly.
Click the search bar, ➜ and select “enter text value”.
Type in the keywords you want to search.➜ Then click “Save”.
Click the search button, ➜ and select “Click an item”.
I just take one page for example.
Click the first profile image. ➜ Select “Create a list of items”.➜ “Add current item to the list”.➜“Continue to edit the list”.
Click the second profile image.➜ “Add current item to the list”. ➜Then “finish creating list”.➜ Click “Loop to process the list.”
Next, set “Ajax time out”.
Choose “load page with Ajax”.( I choose 5 seconds. You can choose a longer time than this)
Then click “Save”.
Then start extracting data.
Click on information you want to grab. ➜ Then select “Extract text”.
Once done configuring extraction rule, click “Next” and run the task on your computer by selecting “Local extraction”.
Now you can see the information I click on has been extracted.
Many of you said you can’t get the result by following the previous LinkedIn tutorial. There are several reasons why you can’t get the result.
One of the reasons is that you scrape the site at a disruptive or violated rate without regarding to the load you’re placing on its server.
In LinkedIn’s users agreement, there’s a term that shows you can’t scrape or copy profiles through any means including crawlers. You may risk your account being shut down or banned if you violate the agreement. The reason why I took LinkedIn for example is to show that Octoparse is capable of grabbing different kinds of websites.
For more information please check out: https://www.linkedin.com/legal/user-agreement?trk=hb_ft_userag
I suggest you guys check the websites you plan to crawl for any Terms of Service clauses related to scraping of their intellectual property.
Another reason is that you don’t set Ajax time out.
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today.
If this video tutorial is not available for you, you can click here to see the corresponding graphic tutorial.