Insights and Analysis

Copyright provisions in the AI Act: generative AI, transparency, and data mining

09 February 2024

Although the AI Act is not intended to intervene in copyright aspects, the recently leaked version acknowledges the essential role of data in developing AI systems and sets out specific requirements and limitations to balance innovation with copyright and related rights. In this sense, the AI Act recalls the EU copyright framework and particularly the text and data mining (“TDM”) “commercial” exception as outlined in Article 4 of Directive (EU) 2019/790 (“EUCD”).

Looking at the guiding principles as set out in the recitals and from the specific provisions provided by the same AI Act:

Adherence to copyright law - providers of general-purpose AI models in the EU market should ensure compliance with EU law on copyright and related rights and, in particular, identify and respect the reservation of rights expressed by right holders under the EUCD, irrespective of where the copyright-relevant training activities have occurred (recitals 60i-j).

Copyright policy - providers must adopt a policy to respect Union copyright law. This includes identifying and respecting the reservations of rights as expressed in Article 4 EUCD (Article 52c(c))

Transparency reporting - providers will be obliged to draft and make publicly available a detailed summary of the content used for training their general-purpose AI models. This summary should be based on a template provided by the AI Office. The obligations also apply to providers of AI models that are made accessible to the public under a free and open license (Recital 60K and Article 52c(d))

Monitoring by the AI Office - the AI Office will be given the role of monitoring compliance with the obligations to respect EU law on copyright and the publication of the training data summary (Recital 60ka).â€‹ â€‹

“New” obligations upon the providers of general-purpose AI models

Article 52c and Recitals 60i-j directly tie into Article 4 EUCD, which set out the TDM exception. The TDM exception allows the reproduction and extraction - which also include activities for AI training purposes - of works or other subject matter contained in networks or databases to which access is legitimately given for the purpose of the extraction of text and data. This is unless that use was expressly reserved by the rightholder (opt-out).

The TDM exception

Yet, the exact scope of Article 4 EUCD is still to be clarified, particularly if it indirectly (i.e., by introducing an exception subject to opt-out) stipulates a new right against any type of extraction and reuse of data for TDM purposes or whether the scope of database and copyright remains applicable. This would mean that database rights would apply only against significant or systematic extractions. And that copyright would apply only in case of relevant reproductions. On a different note, it shall be seen how the validity of the opt-out mechanism will be interpreted. Indeed, the implementation of Article 4 EUCD was subject to different local implementation. For example, Italy did not specify that opt-outs shall be expressed through machine-readable formats.

The rise of generative AI and its clash with copyright holders, appear to have given the commercial TDM exception a new wind, as a tool to limit generative AI.

Against this background, the AI Act seems to reinforce that extraction of text and data is no longer permitted when the right holders have expressly reserved the use of the work/other subject matter (opt-out right).

Therefore, following that reading, when the right holder expresses an opt-out, AI system developers must obtain authorization for TDM regardless of where the data mining and training activities occur. This means that providers when placing a general-purpose model on the EU market, must respect the opt-out rights even if the actual data mining takes place outside the EU jurisdiction.

Transparency reporting

The “obligation to transparency” as set out in the AI Act is intended to help right holders assess whether their opt-out rights are being respected. Indeed, based on Article 52c(d) and Recital 60k, it is likely that providers will be required to publicly disclose the datasets used in training models, including those protected by copyright. The AI Act seems also to specify that the summary of the content used shall be generally comprehensive in its scope to facilitate parties with legitimate interest, including copyright holders, to exercise and enforce their rights under Union law (for example, by listing the main data collection or sets that went into training models).

The new AI Office - which will monitor and supervise the obligations imposed on the providers by the AI Act - should make available to the providers a template of the summary of the content used for the training.

Conclusions

The direction of the new AI Act seems clear: imposing a new set of obligations to providers of general-purpose AI models, also in relation to copyright and data usage. Indeed, a provider that is willing to enter the EU market will need to (i) adopt new policies to be compliant with the EU Copyright law (ii) request specific authorization for the use of opt-out content protected by copyright, even if such use has been made outside the European countries and (iii) disclose a sufficiently detailed summary of the content used for training the model. Yet, the AI Act does not enter into the subject matter of the TDM exception under Article 4 EUCD. Therefore, clarity will be needed as to its right scope of application (opt-out mechanism and scope of right).

Authored by Francesco Banterle and Andrea Schettino.