CLASS 46

Now in its twelfth year, Class 46 is dedicated to European trade mark law and practice. This weblog is written by a team of enthusiasts who want to spread the word and share their thoughts with others.

Want to receive Class 46 by email?
Click here subscribe for free.

Who we all are...

Anthonia Ghalamkarizadeh

Birgit Clark

Blog Administrator

Christian Tenkhoff

Fidel Porcuna

Gino Van Roeyen

Markku Tuominen

Niamh Hall

Nikos Prentoulis

Stefan Schröter

Tomasz Rychlicki

Yvonne Onomor

TUESDAY, 17 JUNE 2025
EUIPO’s study on GenAI and copyright

On 12 May 2025, the EUIPO released a detailed study analysing how generative AI (GenAI) impacts EU copyright law. Gabriele Engels, Chair of the MARQUES Cyberspace Team, explains more.

The study focuses on three areas: the use of copyrighted content as training data by GenAI models (GenAI input), the legal status of AI-generated content (GenAI output) and the broader implications for rights holders, AI developers and the copyright ecosystem.

Although designed for legal experts and policymakers, the findings are significant for tech companies, creators, platforms and rights holders. As companies adopt GenAI on a growing scale, legal and commercial questions around data use, authorship, licensing and enforcement are in urgent need of clear answers.

To help these stakeholders, EUIPO plans to launch a Copyright Knowledge Centre in November 2025, aimed at supporting rights holders to manage the use of their works, inform EU policymakers and tackling key challenges.

GenAI inputs

AI models need vast amounts of data to train their algorithms. This is often collected through web crawling or scraping practices that may involve copyright-protected material. As a result, many of the measures used by copyright holders to control access to their works focus on combating this practice.

Legal background

Under the EU’s Copyright in the Digital Single Market Directive (DSM Directive), text and data mining (TDM), which is the reproduction of content for training purposes, is allowed in specific circumstances.

In this context, Article 4 of the DSM Directive provides for the rights holders ability to “opt out” of TDM by clearly stating their objection, and thereby reserving their exclusive copyrights. For this opt out to be valid, the reservation must be made expressly, by the rights holder and in an appropriate manner, including “machine-readable means” for content made publicly available online. In case of such an opt out, developers must obtain an authorisation by the right holder, usually a licence agreement, before using the content.

The EU’s Artificial Intelligence Act (EU AI Act) adds that GenAI providers must respect TDM opt-outs by copyright holders and disclose sufficiently detailed summaries of the training data they utilise. It also requires AI-generated content to be detectable in a machine-readable format.

Rights reservations

"There is a consensus amongst stakeholders that REP doesn’t meet the DSM-Directive’s standards for expressing copyright opt-outs in an appropriate manner."

Robots Exclusion Protocol (REP) is a technical tool that website owners use to express their rights reservations and manage web crawling and scraping activities. However, there is a consensus amongst stakeholders that REP doesn’t meet the DSM-Directive’s standards for expressing copyright opt-outs in an appropriate manner.

It is seen as an incomplete and temporary solution due to its voluntary nature: it has to be complied with by scrapers, which undermines its enforceability as a technical safeguard.

A widely acknowledged limitation is REP’s inherent lack of granularity and specificity regarding permitted uses. It requires website managers to actively configure and maintain restrictions, making implementation inconsistent across different sites.

No opt-out standard

No opt-out mechanism has emerged as a standard for rights holders to express their TDM rights reservations. Copyright holders use a mix of various legal measures (unilateral declarations, licensing constraints, website terms and conditions) and technical measures (metadata and content provenance protocols).

Legally driven measures are typically applied to specific copyright-protected works, but also entire repertoires of works. Technically driven measures are categorised as either location-based (i.e. applied to a specific copy of a digital asset as hosted in a particular location) or asset-based (i.e. applied to the digital asset more broadly and replicated in every copy of that asset).

Both approaches have their distinct advantages and limitations and rights owners often use a combination of measures.

However, these methods appear to give rights owners only the possibility to express their rights and not to enforce them. AI developers are responsible for respecting these choices and configure their tools accordingly. The study anticipates the development of sector-specific standard practices.

GenAI outputs

The study further explores legal concerns around AI-generated content, noting that output depends on the type of GenAI model and content type created. New tools such as watermarking and digital fingerprinting are helping identify synthetic content produced by GenAI systems and meet transparency requirements under the EU AI Act.

Standard technologies

"While RAG boosts the relevance and efficiency of the data drawn, it introduces legal uncertainty, especially around licensing and database rights, since it differs from traditional training covered by TDM rules."

There is a trend of increased deployment of Retrieval-Augmented Generation (RAG) technologies that combine GenAI with real-time data retrieval, often from copyrighted sources.

While RAG boosts the relevance and efficiency of the data drawn, it introduces legal uncertainty, especially around licensing and database rights, since it differs from traditional training covered by TDM rules. This is particularly the case if dynamic content is read via web scraping.

Technical and legal measures

Developers are adopting technical fixes to reduce the risk of copyright infringement in AI-generated content. These include:

Tools for comparing generated content with potential input sources
Filters to prevent duplicate outputs
Prompt rewriting and negative prompting
Differential privacy to prevent models from memorizing data
Post-training tools such as “model editing” and “model unlearning” to remove or alter specific content

Prompt rewriting changes user inputs to prevent generating near duplicate outputs, while negative prompting also specifies to the model which elements should be excluded from the generation, such as key features associated with copyright-protected characters.

Some providers even offer some form of legal indemnification to users, reflecting the growing awareness of potential legal risks.

In this context it is noted that public bodies can help to mitigate potential infringing outputs and detect synthetic content by providing technical support and raising awareness on technical standards, promoting ethical AI use and support the interoperability of output transparency measures across platforms and GenAI systems. A joint approach – legal, technical, and institutional – is key to managing the risks of GenAI output.

Evolving direct licensing market

"A functioning licensing system requires strong opt-out mechanisms, which can provide new income streams for rights holders."

The study identifies a new market which is forming for direct licensing of copyright-protected content for AI training, driven by demand for high-quality datasets and concerns over future data availability. Press, scientific and academic publishing are early movers.

This market is enabled by the Article 4 DSM Directive opt-out mechanism, making it a copyright infringement for AI developers to use opted-out works that may be available for license.

A functioning licensing system requires strong opt-out mechanisms, which can provide new income streams for rights holders. This also creates a market for technical solutions for managing access to content (particularly in online settings) and administering TDM rights reservations.

The study further lists several key considerations that may affect the evolution of licensing practices. They will depend on benchmark market rates, metrics used for remuneration, legal frameworks for remuneration and compensation models.

Conclusion

Navigating the interaction of GenAI and copyright will require coordinated and forward-looking action. The study calls for technical standards, policy tools and collaboration to keep copyright law able to address the implications of large-scale AI adoption.

Institutions such as the EUIPO can support this through guidance, databases, and awareness initiatives.

The upcoming Copyright Knowledge Centre is a step toward that goal, but long-term success depends on continued cooperation among GenAI developers, policymakers and rights holders, in order to (re-)establish a balance between the creative and IT industries.

Gabriele Engels is a Partner with D Young & Co in Munich and Chair of the MARQUES Cyberspace Team

Posted by: Blog Administrator @ 09.35
Tags: GenAi, copoyright, EUIPO, Cyberspace,

Sharing on Social Media? Use the link below...
Perm-A-Link: https://www.marques.org/blogs/class46?XID=BHA5362

Reader Comments: 0

Ingrid de Groot Internal Relations Officer	ingrid.de.groot@marques.org
Alessandra Romeo External Relations Officer	aromeo@marques.org
James Nurton Newsletter Editor	editor@marques.org
Robert Harrison Webmaster	robertharrison@marques.org