What is scraping?
Data scraping is a computer technique whereby a programme extracts information from human-readable computer sources.
Data scraping makes it possible to extract and structure data from unorganised sources, from which the information may be difficult to understand and extract.
There are many forms of scraping (parsing, report mining, screen scraping, etc.), but one of the most interesting is web scraping. It involves extracting information from web pages, usually in an automated way, using dedicated software.
How is scraping used?
The technique can be used for mass resale, by companies specialising in the collection and resale of data, or for personal commercial canvassing by companies collecting data on their own behalf.
Most often, the data collected is identification or contact data, such as first and last name, telephone number, e-mail or home address, etc.
Is web scraping legal outside Europe?
The massive use of web scraping by the internal services of social networks and the need for companies to use this data to achieve their commercial objectives have recently led the American courts to rule in favour of the use of such tools.
The principle under US law is that companies can carry out any operation involving data as long as the law does not prohibit it. A decision LinkedIn v HIQ of 18 April 2022 gives precedence to *the right to conduct business over the protection of users' privacy. The Court considers that people who choose to post personal information on their public profiles cannot claim that their personal data should remain private and unused, through tools such as scraping, for example.
The approach is reversed in Europe, where the data controller must rely on an appropriate legal basis before any use is made of the data. Otherwise, data processing will be in principle unlawful.
In Europe, doesn't the greater protection afforded to personal data by the RGPD preclude such uses?
Data on platforms or social networks is very often public, as no identification is required to consult it. However, it is still personal data. As a result, the processing of such data falls under the scope of the GDPR when it applies and the users of such tools become processing controllers.
A number of its articles are likely to be violated by such a practice.
As a data controller subject to the GDPR, can we still use scraping on platforms such as LinkedIn? In what context?
Web scraping is a tool governed by the GDPR, and its use is subject to 3 main conditions:
1) You must not violate the general terms and conditions of use of the platform from which the data originates.
The first difficulty lies in the platform's terms and conditions of use. LinkedIn's T&Cs, for example, state: "You agree not to develop, support or use software, devices, scripts, robots or any other means or process designed to perform web scraping of the Services or otherwise copy profiles and other data from the Services".
Thus, using scraping on LinkedIn exposes the user to penalties, especially as the platform has implemented powerful algorithms to detect these tools.
2) Complying with GDPR obligations regarding direct commercial prospecting
In terms of commercial canvassing of individuals, the principle is the prohibition of direct canvassing in the absence of information and the obtaining of prior consent from the individual. The only possible legal basis for commercial prospecting is therefore the consent of individuals.
The only exception to this principle is when the individual, in the context of the platform, can reasonably expect their data to be re-used for this purpose.
On the subject of people's expectations and the use of scraping on LinkedIn, a CNIL deliberation of 8 December 2020 is particularly enlightening.
Nestor, a company selling meals in the workplace, used a web scraping tool to build up a customer database via the LinkedIn network. It invoked the legal basis of its legitimate interest in prospecting professionals in order to build up and use this database. In fact, professional-to-professional (B2B) canvassing on subjects related to their business can be carried out without their consent.
However, the CNIL first considered that "the canvassing messages for the sale of meals at people's place of work have little connection with the professional activity of the prospects", and then ruled that there had been a breach of the GDPR in that the company had failed to fulfil its obligations to inform and receive consent. The company was unable to rely on its legitimate interest in carrying out this data processing.
3) Comply with the main principles of the GDPR applicable to all data processing.
These include the failure to inform individuals, the failure to consent, or the failure to respect individuals' right to object, in particular when individuals have already objected to any re-use for canvassing purposes.
In addition, if the company uses a service provider to supply the tool, it must ensure that the obligations of Chapter IV of the GDPR on subcontracting are met.
Lastly, it may be compulsory in certain cases to carry out an PIA / DPIA. Even if it is not compulsory, given the characteristics of such processing, it is recommended that one be carried out in any event.
Dastra can help you comply with the GDPR with a simple and effective solution: ask us for a demo.
What are the penalties for non-compliance with these obligations?
On the basis of the GDPR, it is possible to be condemned for violating articles 5, 12 and 13 of the GDPR (principles of processing and rights of individuals). Article 83 of the GDPR provides for an administrative fine of up to €20 million or 4% of the company's total worldwide annual turnover.
In addition, a specific offence can exist in national laws. For instance in France,'sPenal Code, there is a "fraudulent, unfair or unlawful collection of personal data ", set out in Article 226-18.
Any collection carried out fraudulently (e.g. without the knowledge of the persons concerned), regardless of whether the data is public or not, is punishable by five years' imprisonment and a fine of 300,000 euros.
Scraping can also be used to commit another offence: infringement of the rights of the database producer. Databases are protected by copyright and by a sui generis right (in its own right) protecting the producer of the database (articles L. 112-3 and L. 341-1 of the French Intellectual Property Code).
The database producer may prohibit any extraction of a substantial part of the database, as well as its reuse by making it available to the public.
The penalties incurred are a 300,000 euro fine and 3 years' imprisonment.