How to Scrape Data from CraigslistMonday, August 17, 2020
This article covers:
Why do people scrape Craigslist?
Craigslist gathers expansive information. Some may not be satisfied just browsing it, they scrape data from Craigslist for a variety of reasons. Below are the typical 4 of them.
1> Individuals can extract first-hand information regarding houses, cars, computers and many more. When exported into excel sheets, it is much easier for them to look through and compare the data.
2> Craigslist, similar to Yellowpages and Yelp, is full of potential business leads for revenue generation. No doubt that leads are important, especially qualified ones. This is probably the reason why Craigslist appeals to so many people.
3> Gain profits by reselling goods. With scraped data in a well structure, people can better analyze prices and set a new one for reselling. However, reselling is rather in the gray area, thus this might not be a good try. It’s profitable sometimes, but the consequences may not be delightful.
4> Monitor competitors. Craigslist is full of precious information covering an array of industries where people can keep track of their competitors. Being informed of their strategies in real-time will help businesses gain an edge in competition.
Is scraping Craigslist illegal?
As one of the most popular websites out there to scrape, Craigslist has proved to be one of the toughest ones. The reason is simple: unlike websites that provide users with APIs to get data, Craigslist API is not aimed at pulling data off. Quite on the contrary, it is used for posting data on Craigslist.
Just like Facebook and LinkedIn, Craigslist’s terms clearly state that all sorts of robots, spiders, scripts, scrapers, crawlers are prohibited. And they won’t allow people to steal their users’ personal information on the site.
Craigslist has used various technological and legal methods to prevent being scraped for commercial purposes. In fact, in April 2017, Craigslist obtained a $60.5 million judgment against 3 Taps Inc, a company that is accused of scraping real estate listings. A few months later, Craigslist reached another $31 million judgment with Instamotor, claiming that Instamotor’s car listing service was scraped from Craigslist, and they sent unsolicited emails to craigslist users for promotional purposes.
Nevertheless, as said in an article entitled 10 Myths about Web Scraping, it is illegal if you scrape confidential information for profit, but if you scrape public data discreetly for personal use, you should be fine.
How to scrape data from Craigslist?
If you are a coder, you can follow this Python tutorial on scraping East Bay Area Craigslist for apartments. The code in this tutorial can be modified to pull from any region, category, property type, etc. Or you can check out this Scrapy tutorial to learn to crawl Craigslist’s “Architecture & Engineering” jobs in New York and store the data to a CSV file.
But the problem with the above tutorials are obvious: they are way too complicated for non-coders. If you have zero coding experience and want a simple and quick method, here’s a catch - use an automated data scraping tool like Octoparse.
With the power of data scraping, we can extract all the info we want from Craigslist listings within clicks and export them into Excel, CSV, HTML, and/or databases easily. I will walk you through how to extract Craigslist real estate listings within 3 steps.
Real estate listing extracted from Craigslist
Craigslist data scraping with Octoparse
Step 1: Enter the target Craigslist URL to build a crawler
Enter the listing URL into the box, and Octoparse will start detecting the page data automatically. As you can see, the data to be extracted is highlighted in red, and the preview section below allows you to pre-edit the data fields.
Step 2: Save the extraction setting
After making sure that the data fields are what we want, click “Save settings” and Octoparse will auto-generate a scraping workflow on the left-hand side.
Step 3: Run the extraction to get data
Finally, you only need to save the crawler and hit “Run” to start extraction. The scraping process can be done within 5 minutes.
Please note that even though this article guides you through extracting Craigslist data, you should always respect its Terms of Service and scrape at a moderate frequency.
Data scraping tools can not only scrape all Craigslist listings, but also they are used in many scenarios, including Marketing, E-commerce and Retail, Data Science, Equity and Financial Research, Data Journalism, Academic, Risk management, Insurance and many more. You can read about web scraping uses in business in this article: 25 Hacks to Grow Your Business With Web Data Extraction.
Edited by Cici