On October 12, 2023, the French data protection authority (“CNIL”) released official guidance on how to comply with the General Data Protection Regulation (GDPR) in the development phase of an artificial intelligence system (“AI system”). The guidance particularly addresses compliance during the developmental stages rather than during the deployment phase of these systems.".
In response to numerous concerns voiced by AI stakeholders during the call for contributions initiated on July 28, 2023, the CNIL has acted swiftly. The CNIL clear intent is to assuage industry apprehensions by releasing an initial series of guidelines. The main point is to emphasize that AI system development and privacy considerations can indeed coexist harmoniously. To quote the CNIL: "The development of AI systems is compatible with the challenges of privacy protection. Moreover, considering this imperative will lead to the emergence of devices, tools, and applications that are ethical and aligned with European values. It is under these conditions that citizens will place their trust in these technologies."
This guidance includes 7 “How-to? sheets” offering insights on core GDPR principles applied to AI systems development phase. Key takeaways include:
- Purpose limitation: AI systems using personal data or potentially impacting individuals must be developed and used for a specific, legitimate purpose. This means that businesses and organizations must carefully consider the purpose of their AI system before collecting or using personal data and make sure that the purpose of the AI system is duly determined and compatible with the GPDR principles. Purposes must be precisely described, and a too generic purpose such as "development and improvement of an AI system" is not considered as valid. This also applies to general-purpose AI systems, but the CNIL acknowledges that the purposes of certain AI systems cannot be precisely determined at the development stage. In such a case, the type of system (e.g., large-scale language model or an image-generating AI system) and the main possible functionalities should at least be well described. This underscores the necessity for further clarity regarding context description and legal evaluation, which AI system stakeholders are expected to duly document.
- Data minimization: the CNIL outlines three scenarios: training without personal data, training with personal data, and training possibly involving personal data. For mixed-data records (both personal and non-personal), the GDPR applies. Crucially, only personal data essential for the purpose of the AI system should be collected and used. According to this guidance, businesses and organizations should avoid collecting or using more data than is necessary for the AI system to function properly and should implement organizational and technical measures to purge unnecessary personal data. The CNIL clarifies that this principle does not prevent the use of large databases, provided that the personal data set used has been selected to optimize the training of algorithms. While this provision of guidance open very interesting nuances, there is no doubt they will trigger substantial efforts to craft comprehensive governance and practical solutions for compliance.
- Data retention: being aware that building training databases demand significant scientific and financial investments and often become widely adopted standards within the community, the CNIL confirms that the principle of limited data retention does not prevent setting extended durations for training databases, as long as it can be justified by the legitimate purpose of AI systems. This statement offers greater flexibility to data controllers in retaining their training data.
- Data reuse: the CNIL emphasize that database reuse, including publicly available data, is possible for training AI systems, provided that (i) the data has not been collected in an manifestly unlawful manner and that (ii) the purpose of reuse is consistent with the initial purpose of personal data collection. In this respect, the CNIL encourages stakeholders to check their sources before using them. The Authority has offered an exhaustive step-by-step guide on the checks and verifications necessary for data reuse. Stay tuned to this blog, as we’ll delve into a detailed analysis of this guidance shortly.
In addition to these key issues, the CNIL's guidance also addresses a number of other topics, outlined in detailed fact sheets: purpose defining, conducting of data protection impact assessment DPIA, controllership determination, legal basis choice, privacy by design, etc.
This first release is a highly valuable resource for businesses and organizations involved in AI systems, not only in France but throughout any jurisdiction under the GDPR. It unequivocally asserts that AI development and privacy considerations can harmoniously coexist, provided there's robust governance and vigilant content oversight.
This guidance also marks the beginning of the essential evolution of CNIL doctrine to meet challenges raised by AI, particularly in terms of personal data minimization and retention. As the CNIL has announced two more guidance sets, it's imperative for AI stakeholders to keep a keen eye for these forthcoming directives.
Authored by Amjad El Hafidi, Julie Schwartz, and Remy Schlich.