undefined
Blog > Big Data > Post

Scraping the Fortune 500 Company Job Boards Step by Step

Friday, August 16, 2019

I think you agree with me that Linkedin is very successful in aggregating jobs and motivating professionals. More than all other recruiting platforms combined, recruiters more likely search for candidates on Linkedin. 

 

In fact, that is a true fact that whoever owns the job seeker markets owes a multi-billion dollar market. Indeed, Monster, Ziprecruiter know that. Even Google started to share the job market pie in 2017. 

 

Companies keep spending money to find the candidates to fit the right jobs. As a result, there are still great potentials for us to explore the job markets.

 

In this article, I will walk you through the entire journey of building a job board website of Fortune 500 from ground zero. Also, I will parse Linkedin's business model to fuel your business.

 

A job board website works as a media agent to match the right candidates with prospective companies. Employers pay to post job listings on the site, and job seekers send resumes and cover letters to interested companies. The quality and quantity of the job listings are, therefore, crucial for your websites to survive. There are two approaches that you can boost the volume of listings on your job board websites:

1. Scrape job listings from the career section of companies’ websites

2. Scrape from job listing search engines, like Indeed and Monster.com

 

First Approach:

Because each company has its website, we need to build a spider for all of them. A traditional method is to write python with Beautiful Soup. It leads to high initial cost and maintenance cost. We need to write an individual script for each company due to each website has a unique layout. Besides, the site will likely change its web structure. As such, we have to rewrite the script and build a new spider to scrape the website. Also, there are so many websites which would be only accomplished by a troop of tech experts to make your website sustained. The high marginal cost of adding one more labor is untenable for business.

Web scraping tool comes in handy as the most effective alternative with much lower cost. It allows us to automate the entire scraping process without writing a script. Octoparse stands out as the best web scraping tool. It will enable both first-time starters and experienced tech experts to extract data at click-through visualized interface. 

 

Since there are 500 websites, I will take Facebook as an example in this article. (This is Fortune 500 list of companies' websites, and welcome to take full advantages.)

 

Facebookcareer

 

As one can inspect, the webpage contains ten listings and spread over to multiple pages. We will click through each job listing, extract each job title, location, responsibility, minimum and preferred requirements. For web page with a nested list (List that contains additional list) like this we can:

 

  • Collect first layers of listing URLs to expedite the scraping process, mainly when the website includes a large volume of listings.
  • Set up an automated crawler to scrape detail pages.

 

1. A URL follows a consistent pattern with a fixed hostname and a page tag at the end. The number changes accordingly as you paginate. As such, we copy the first page URL to a spreadsheet and drag down to get a list of website URLs. 

 

pagelist

 

2. Then we set up a crawler with this list URLs using Octoparse.

With the built-in browser, we can extract target elements on the web page with a given command. In this case, we click one job listing within the page and choose "Select All" to create a loop item with all listings.

 

selectall

 

3. Then choose "Loop Click Each Element" to go through each detail page. 

4. Likewise, select to extract elements including job title, location, responsibility, minimum and preferred requirements from the detail page. You should be able to get a job listing extracted like this DEMO_Facebook_Career_List 

 

Follow the same idea, and we can create as many crawlers as you need with Octoparse. In addition, the risk of high maintenance cost is minimized. You can set scraping schedule and deliver up-to-date job listing to your database through the API portal. 

 

Second Approach:

Job search engines like Indeed and Monster.com provide a considerable amount of job listings. We can obtain those job information from both large and small companies with one crawler. On the other hand, it doesn't give you a competitive edge among the competitions if you source from job search engines. The most approachable solution is to find a niche. Instead of a website with a broad scope, we can narrow down to specific groups. It can be creative based on supply and demand. In this case, I scraped 10000 job listings and correlated locations, and them with the map to see how data science positions spread geographically.

 

datascientist

 

Data science positions predominantly cluster in coastal areas with Seattle and New York possessing the highest demands. With that in mind, it would be an excellent opportunity to help more tech companies to find the right candidates through local data scientist communities.

 

I have a similar video that tutors how to scrape job listings.

 

Why is Linkedin successful?

From a blip to a giant, Linkedin is sophisticated on business strategies. Here are four factors inspired by them that would benefit your business on many levels:

  • Find the right evangelist: a kick-off start is to invite "champions" and industry leaders to evangelize your website. These champions have the charismatic effect that will translate into elites assembling.

 

  • Social network community: Subscribers carry more business values once they gather together. Community generates UGC (user-generated content) to attract more quality users to share their ideas. These are the assets that will increase competitiveness.  

 

  • Trustworthiness: The goal of a job board website is to help them land on their careers. It's a little cliche to say "helping others helps yourself", but it is the right mentality if you are in pursuit of a successful business. 

 

Author: Ashley

Ashley is a data enthusiast and passionate blogger with hands-on experience in web scraping. She focuses on capturing web data and analyzing in a way that empowers companies and businesses with actionable insights. Read her blog here to discover practical tips and applications on web data extraction

Si desea ver el contenido en español, por favor haga clic en: Scraping el Comité de Trabajo de la Compañía Fortune 500  También puede leer artículos de web scraping en el sitio web oficial 

source:

https://towardsdatascience.com/influencer-marketing-using-web-scraping-568ef4c072c3
https://www.statista.com/statistics/976194/annual-revenue-of-linkedin/

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download