With humans spending $1.5 billion in copyright settlement, the AI industry will address its training data issues. have Up to 40 other pending cases Seeking damages for unlicensed data – including data that sends Midjourney to court Create Superman Images.
Without some kind of licensing system, AI companies may face a lot of copyright lawsuits Somewhat worried Will permanently back the industry.
Now, a group of technicians and web publishers have launched a system that enables data licensing at scale, provided that AI companies support it. The system is called True Simple License (RSL), and has been supported by major network publishers such as Reddit, Quora, and Yahoo. The question now is whether this momentum is enough to bring major AI labs into the bargaining table.
According to RSL co-founder Eckart Walther, he also co-created the RSS standard with the goal of creating a training data licensing system that can be extended on the internet. “We need to have a machine-readable license agreement for the internet,” Walther told TechCrunch. “This is really the problem RSL solves.”
For many years, Dataset Provider Alliance Clearer collection practices have been pushing for, but RSL is the first attempt at technical and legal infrastructure that can make it work in practice. In terms of technology, the RSL agreement lists specific licensing terms that publishers can set for their content, whether it is an AI company that requires a custom license or adopts creative sharing regulations. Participating websites will use the term as part of their “robots.txt” file in a preprocessed format, allowing it to directly determine which data belongs to which clause.
On the legal side, the RSL team has established a collective licensing organization, RSL Collective, which can negotiate terms and collect royalties, similar to the ASCAP of musicians or the MPLC of movies. As in music and movies, the goal is to provide licensors with a single point of contact to pay royalties and provide a way to set terms with dozens of potential licensors at once.
Many web publishers have joined the collective, including Yahoo, Reddit, Medium, O’Reilly Media, Ziff Davis (owner of Mashable and CNET), Internet Brands (owner of WebMD), People Inc. and Daily Beast. Others, such as the quick Quora and Adweek, are supporting the standard without having to join the collective.
TechCrunch Events
San Francisco
|
October 27-29, 2025
It is worth noting that RSL collectively includes some publishers who already have licensed transactions – most famously Reddit, Estimated $60 million a year Use its training data from Google. There is nothing to stop companies from cutting their own deals in RSL systems, just like Taylor Swift can set special terms for licenses while still collecting royalties through ASCAP. But for publishers who are too small to reach a deal on their own, the collective terms of RSL are likely to be the only option.
But while it is easy to determine when to play songs, AI models face unique challenges when figuring out when specific training data should be used. The question is the simplest for products like Google’s AI Search Summary, which draws data from the web in real time and maintains strict attribution of each fact.
However, if training is not recorded when it occurs, it is almost impossible to confirm that the given file is ingested in the LLM. This is especially challenging if publishers require a per push instead of charging a package fee, one of the options offered by the stock RSL license.
However, the creators of RSL believe that AI companies will be able to manage the difficulty. “Some of the licensing agreements they have completed require them to report it,” said Doug Leeds, co-founder of RSL and former CEO of IAC Publishing. “It doesn’t have to be perfect. It has to be good enough to get paid.”
The bigger question is whether AI companies will accept the system. Just as the success of companies like Scaleai and Mercor, Frontier Labs undoubtedly pays for data, but traditionally, the network has been seen as a source of cheap, low-quality data. With datasets like universal crawling that are already available, extracting royalties from the lab for free can be a challenge. and The nearest dust In CloudFlare and Confusion, it is not straightforward to distinguish the difference between web clipping and machine-enhanced browsing.
When I asked Leeds, he pointed out that the latest comments from AI leaders called for systems like RSL – especially Starting from Sundar Pichai at last year’s Dealbook Summit. Whether the licensing system is seriously called, the RSL team plans to keep it. “They said to everyone that something similar needs to exist,” Leeds told me. “We need a protocol. We need a system.”
Now, they might get one.