Web scraping is a way to extract data from the web using automation tools & technologies. Previously, businesses have been very casual with web data collection. But with the onset of GDPR regulations, due diligence concerning data extraction is a must. Recently, Poland imposed a Euro 220,000 fine on an organization that collected data of around 7 Million people but failed to inform them (informing individuals is a rule under Article 14 of GDPR). Also, a few months back, French DPA issued guidance related to commercial web scraping. So, we thought of explaining what GDPR means and why it matters to the scraping community. Read this article to know everything you need to be GDPR compliant while scraping the web.
When GDPR Comes Into Play?
First, let’s peek into what can be scraped from the web, and then we discuss what type of data falls under GDPR and what does not.
You can scrape:
- Real-estate ads to carry out personalized marketing,
- Stock indexes, news portals for market intelligence,
- Job postings to fuel your HR services,
- Social media sites to analyze customer sentiments,
- Online directories for prospecting,
- Public data from government websites for insights,
- Products data from eCommerce sites for competitor tracking and price intelligence,
- Blogs, videos, and whatnot.
Of course, the use cases of data extraction are not limited to these but on a broad level, this gives you an idea about different types of data you can scrape. Now, GDPR which stands for General Data Protection Regulation (EU) 2016/679 is a law in the European Union (EU) on data protection and privacy of all individuals within the EU & EEA. GDPR serves two purposes:
- Puts individuals in control of how their data is used
- Simplifies the regulatory environment for businesses operating in the EU region
The question is, on what ground do data scraping and GDPR cross paths? When should you care about GDPR? A short answer would be, whenever you’re scraping the personal information of an individual/citizen residing in the EU.
To know whether you need to comply with GDPR or not, and to ensure your scraping project is GDPR compliant, find the answers to the following questions:
- What Qualifies As Personal Identifiable Information (PII)?
- Are you scraping the PII of EU citizens?
- Do you have a lawful ground for scraping personal data?
- What can you do to be GDPR Compliant?
What Qualifies As Personal Identifiable Information (PII)?
Any piece of data that can help someone trace or identify an individual would qualify as PII. Examples could be:
- Contact no.
- Postal address
- Credit card details
- Bank details
- IP address
- Individual’s picture/video/audio
- Medical Reports
- Employment details, etc.,
Are You Scraping the PII of EU Citizens?
GDPR is strictly concerned with personally identifiable information of individuals within the European Union and the European Economic Area (EEA). So, the next question that arises is are you scraping data of European citizens? If the answer is a ‘No’, then you’re safe. So, let’s say, if you’re scraping data that concerns India, the USA, or Australia, then you need not worry about GDPR. Instead, you should be looking for data protection laws within their respective jurisdiction. GDPR jurisdiction is limited to EEA. If your scraping projects need you to scrape the PII of EU citizens, you must have a lawful ground to do so.
Do You Have a Lawful Ground for Scraping Personal Data?
The lawful bases are set out in Article 6 of GDPR, and there are six lawful bases for processing scraped data:
This can be your legal basis when individuals, of whom you’re scraping data, have given you consent to scrape their data for specified purposes.
Contract with the concerned individuals can be on a legal basis under GDPR if the contract necessarily requires you to process the data.
3. Legal Obligation
The third type of legal basis could be if the data processing is necessary for you to comply with a legal obligation.
4. Vital Interests
You can argue Vital Interests to be the legal basis for your scraping project if it is intended to save someone’s life.
5. Public Tasks
When the data processing is done for the public interest or to deliver your duties as an official, it will be counted as a legal basis.
6. Legitimate Interest
If the data processing is necessary for the legitimate interest of the data controller, you can also count it as a basis for the legal processing of data under GDPR. But this will not be the legal basis if it overrides the fundamental Rights or interests of an individual whose data is being collected and processed.
To sum it up, consent and contract are more or less the same. If individuals have given you consent then it’s okay to process their data. When will this be applicable? Let’s take an example. Say, there is a fashion-retail website that collects product reviews from shoppers as well as shopper’s PII and makes it publicly available under reviews. PII could be the age, name, and location. General data would be reviewed text and time. Now, if you need to scrape only review text for research to fuel your new product development then you need not worry about GDPR. But if you’re scraping name, age, location, and other details too then you’re entering into the PII zone and need to be GDPR compliant for addressing legal compliance.
Vital interests, public tasks, and legal obligations would rarely form your legal bases as they are clear-cut concepts and there isn’t much room for theoretical arguments. But legitimate interest could potentially be your strong legal ground if you’re web scraping. But for most companies claiming this to is a challenge.
HiQ vs Linkedin case is an interesting read too.
What Can You Do to Be GDPR Compliant?
Here’s a checklist for you to ensure your scraping and data processing project is GDPR compliant:
Stay away from the wrong interpretation of the articles in GDPR regulation
One myth is any publicly available PII data is scrapable and usable for marketing or some other purpose. This is not the case. Consent or Legitimate interests could only be the legal basis for processing PII data even if it is publicly available. So, you can’t launch marketing campaigns on email IDs obtained from comment sections of social media if they belong to EU citizens/websites.
This is a no-brainer. If you do not have a solid legitimate interest in obtaining PIIs, getting consent is the only way out.
Inform Individuals About Data Collection
Article 14 of the GDPR makes it necessary to inform all individuals whose data has not been directly collected from them.
Ensure DSAR is preserved
EU residents have the right to request a copy of the data that they possess, withdraw consent to scrape/keep their data, or even request deletion of their data. You need to ensure that your project complies with Data Subject Access Rights (DSAR).
Report Data Breach
Article 33 of the GDPR requires you to inform the supervisory authority in case of a data breach within 3 days unless the personal data breach is unlikely to be a threat to the fundamental rights of an individual.
Data Protection Impact Assessment (DPIA)
In case, you can’t oblige with Article 14 of GDPR which necessitates informing individuals about their data collection, you need to get DPIA.
Make sure your residential IP proxies are GDPR compliant too
Companies or data scrapers often make use of residential IP proxies to scrape the web at scale, or to overcome anti-scraping techniques implemented by websites.
Audit your new & old scraping projects iteratively
It makes sense to audit all your existing, new, and old scraping projects to check if they are GDPR compliant or not and accordingly intervene as and when necessary.
Data scraping has changed how businesses operate. Some new business verticals, like news aggregation, have stemmed out of web scraping. Still in its infancy, but GDPR has radically changed how businesses scrape the web. It is one of the most comprehensive and impactful data protection laws to date. If your scraping project needs you to scrape PIIs, to avoid hefty fines, it’s better to make sure you’re GDPR compliant.