undefined

Scrape Job Postings from Indeed.com

Wednesday, September 14, 2016 3:03 AM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.

 

Indeed is one of the most popular jobs posting websites. With web scraping, you can uncover the value of tons of job information. In this tutorial, we will show you how to use Octoparse to scrape the job posts from Indeed.com.

Before we get started, we need to get the URL of the target result page by searching a keyword and a location.

Below is an example URL for demonstration:

https://www.indeed.com/jobs?q=devops&l=Dallas-Fort%20Worth%2C%20TX&radius=50

 

The easiest way to scrape the website is to go to "Task Templates" on the main screen of the Octoparse scraping tool and start with the ready-to-use Indeed Templates directly to save your time. Just input the URL into the template, and you can wait for the data to come out. For further details, you may check it out here: Task Templates

 

If you would like to know how to build the task from scratch, you may continue reading the following tutorial. 

Here are the main steps in this tutorial:  [Download task file hereexternal-link-symbol-1.png ]

1. Go to Web Page - Open the targeted web page

2. Auto-detect the web page - create the workflow

3. Set up the wait time for "Extract Data" - control scraping speed

4. Start extraction - run the task and get data

 

1) Go to Web Page- open the targeted web page

  • Enter the URL on the home page and click "Start"

 

2) Auto-detect the web page - create the workflow

  • Click "Auto-detect the web page data" on the Tips panel and wait for the detection to complete
  • Go to "Data preview" to see if you are satisfied with the current data output

 

scrape indeed data by auto detection

  • You can delete unnecessary data fields directly by clicking the icon  src="https://helpcenter.octoparse.com/hc/article_attachments/900002867626/mceclip13.png" alt="mceclip13.png" />
  • You can also modify the data field names here directly by clicking the icon mceclip14.png
  • Uncheck the option of "Add a page scroll"
  • Click "Create workflow"

 

3) Set up the wait time for "Extract Data" - control scraping speed

  • Click open the action settings of the "Extract Data" 
  • Click "Options"
  • Tick "Wait before action"
  • Set up the wait time as 1-2s

 

indeed scraper settings

 

4) Start extraction - run the task and get data

 

Here is sample data for your reference.

 indeed data scraped by octoparse

 

 Is this article helpful? Contact us anytime if you need our help!

 

Happy Data Hunting!

Author: The Octoparse Team

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today. 

 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept Close