First of all, I am not a lawyer nor an expert. This article is only based on my experience working at Octoparse. If you're facing real legality problems, please seek legal assistance accordingly.
Web crawling, also as known as data scraping or data scraping in technical terms, is a computer program technique used to scrape huge amounts of data from websites where regular-format data can be extracted and processed into easy-to-read structured formats. The uses for businesses or individuals or other purposes are countless.
Is web crawling legal? Well, it depends. There’s a lot of uncertainty regarding the legality of web crawling.
If you’re doing web crawling for your own purposes, then it is legal as it falls under fair use doctrine. The complications start if you want to use scraped data for others, especially commercial purposes. Quoted from Wikipedia.org, eBay v. Bidder's Edge, 100 F.Supp.2d 1058 (N.D. Cal. 2000), was a leading case applying the trespass to chattels doctrine to online activities. In 2000, eBay, an online auction company, successfully used the 'trespass to chattels' theory to obtain a preliminary injunction preventing Bidder's Edge, an auction data aggregator, from using a 'crawler' to gather data from eBay's website. The opinion was a leading case applying 'trespass to chattels' to online activities, although its analysis has been criticized in more recent jurisprudence.
As long as you are not crawling at a disruptive rate and the source is public you should be fine.
I suggest you check the websites you plan to crawl for any Terms of Service clauses related to scraping of their intellectual property. If it says "no scraping or crawling", maybe you should respect that.
Here are my suggestions.
1. Scrape websites discreetly. Don’t scrape websites at a disruptive or violated rate without regard to the load you're placing on the target servers.
2. Use the data discreetly. It's better for everyone. You would have problems using the data scraped if the data is copyrighted. Use the data for legal purposes.
Author: The Octoparse Team