RSL, the New Protocol That Wants to Make AI Pay for Web Content

by Dario Ferrero

If artificial intelligence were the Pac-Man of the digital age, the internet would be its endless maze full of dots to devour. Except this time the dots are our articles, our photos, our videos, and Pac-Man has never reached for his wallet. It is in this digital wild west scenario that Really Simple Licensing (RSL) is born, a new standard that promises to bring some order to the chaos of wild data scraping for AI training.

The Return of the Father of RSS

As in any good tech comeback story worth its salt, this one also has its origins in a web legend. Dave Winer, co-creator of RSS in the 90s, returns to the scene not out of nostalgia, but with a specific mission: to give content creators the tools to decide how their intellectual property is used in the age of artificial intelligence.

Alongside Winer, the RSL project features Eckart Walther - co-founder and CEO of the startup developing the standard - and Doug Leeds, a former executive at Yahoo and IAB Tech Lab. A triumvirate that combines technical experience, entrepreneurial vision, and deep knowledge of the digital market.

The genesis of the project is rooted in a frustration shared by many publishers: seeing their content used to train AI models without any explicit consent or compensation. "RSL is an open standard that allows publishers to define machine-readable license terms for their content, including attribution, payment per crawl, and payment for inference compensation," explains the project's official website.

A SIAE for the Digital Age

If we were to find a real-world analogy, RSL works like a hi-tech version of SIAE for music rights, but applied to the world of web content. The protocol allows publishers to define in a standardized and machine-readable way the terms of use of their content for AI training purposes.

Technically, RSL is based on an XML format that can be integrated directly into web pages or provided as a separate feed. The system provides for different types of licenses: from simple attribution to "pay per crawl" or "pay per inference" models, where compensation is calculated based on the actual use of the content in AI models.

The implementation is surprisingly elegant in its simplicity. A publisher can specify that their content requires a custom license for AI training, or make it available under Creative Commons with simple attribution. It's like having a digital sign that says "to pass, pay the toll," but written in a language that even the most sophisticated bots can understand.

The Web Giants Mobilize

The launch of RSL did not happen in a vacuum. Some of the biggest names on the web have decided to support the initiative from the outset: Reddit, Yahoo, Automattic (the company behind WordPress.com), Quora, and Medium have all joined as early adopters.

The decision of these giants is not accidental. Reddit, in particular, has already experimented with monetizing its data for AI through direct agreements with Google and OpenAI. The adoption of RSL represents a natural evolution of this strategy, allowing the licensing process to be automated and standardized.

Yahoo, for its part, brings to the table a wealth of content accumulated over decades of activity, while Medium and Quora represent two of the main platforms for user-generated content. Their participation signals that RSL is not just a matter for large media companies, but touches the entire ecosystem of digital content creation.

The Technology Under the Hood

From a technical point of view, RSL presents itself as a natural evolution of existing protection mechanisms. If robots.txt was the digital equivalent of a "no entry" sign, RSL is more like a sophisticated automatic ticketing system.

The protocol supports different payment and licensing methods. A publisher can choose to require a subscription for access to their content for AI training purposes, or opt for a pay-per-use model. The flexibility of the system also allows for the definition of different licenses for different types of content on the same platform.

The integration with existing systems has been designed to be as non-invasive as possible. RSL can coexist with robots.txt and other standards, adding a layer of granularity in rights management that simply did not exist before. It's like moving from an on/off switch to a dimmer with infinite gradations.

Examples from rslstandard.org

The Challenges of Enforcement

Of course, not everything is rosy in the RSL garden. The main challenge remains that of enforcement: how to ensure that AI crawlers actually respect the specified licenses? It is here that the project reveals its still experimental nature and its potential weaknesses.

Unlike robots.txt, which has enjoyed almost universal respect from "civilian" crawlers, RSL enters a much more complex territory from a legal and economic point of view. If an AI model ignores the RSL licenses and still uses the content, what are the practical consequences? And above all, how can a small publisher enforce their rights against tech giants with legions of lawyers?

The answer, at the moment, is still under development. The project relies on the fact that the main AI companies have an interest in maintaining transparent and legal relationships with content providers, especially at a time when regulation of the sector is becoming increasingly stringent.

The Data Market Evolves

RSL arrives at a particularly interesting time for the data economy. The $60 million deal between Reddit and Google for the use of the platform's content in AI training has set a precedent, showing that there is a real and substantial market for this type of content.

The new standard could democratize this market, allowing even smaller publishers to monetize their content instead of seeing it simply "requisitioned" by AI crawlers. It's a bit as if, after years in which anyone could enter your store and take the goods for free, a system finally arrived to make them pay the bill.

The challenge will be to create an ecosystem where the value of content is recognized without creating excessive barriers to innovation in AI. It is a delicate balance, similar to the one the music industry had to find with the advent of streaming.

When Indies Meet the Majors: The New Content Ecosystem

If the big players like Reddit and Yahoo represent the "major labels" of digital content, RSL could finally give a voice to the "indie artists" of the web: independent bloggers, creators on niche platforms, small news outlets. It is here that the new standard shows its most revolutionary potential.

A blogger who writes about vegan cooking from their home kitchen could find their content used to train culinary chatbots without ever seeing a cent. With RSL, that same blogger could specify that their content requires a commercial license for AI use, turning their passion into a source of passive income.

The situation is reminiscent of that of musicians before the advent of Spotify and streaming platforms: only the major record labels had the bargaining power for advantageous agreements, while independent artists remained on the sidelines. RSL promises to change this dynamic in the world of digital content.

Intermediate platforms play a crucial role in this transformation. WordPress.com, which hosts millions of blogs, could implement RSL as a native feature, allowing its users to automatically monetize their content for AI use. Substack could do the same for its newsletter writers, creating a new revenue stream for independent creators.

But not all that glitters is gold in the land of pixels. The adoption of RSL by small creators presents unique challenges. The technical complexity of implementation, the need to understand the different licensing models, and above all the ability to enforce one's rights are all significant obstacles for those who do not have a legal team behind them.

This is where the importance of technological intermediaries comes into play. Platforms like Medium, which has joined the RSL project, could act as "rights aggregators," negotiating collective agreements for their creators and distributing the proceeds. It is a model reminiscent of that of music collecting societies, but applied to the digital world.

The real test for RSL will be to demonstrate that it can create value even for the smallest creators, not just for the web giants. If a food blogger can earn enough from RSL to buy more valuable ingredients for their recipes, then the system will have truly democratized the digital content economy.

AI That Behaves Well: Compliance, Legislation, and the Future of Digital Rights

If RSL were a Star Wars character, it would be C-3PO: obsessed with protocol, rules, and the correct interpretation of intergalactic laws. And like the golden droid, RSL could prove to be more valuable than it initially seems, especially in a regulatory universe that is becoming increasingly complex.

The timing of RSL's launch is not accidental. Europe has already approved the AI Act, the most comprehensive legislation on artificial intelligence in the world, which will come into full force in 2025. The United States is working on similar regulatory frameworks, while China has already implemented several specific regulations for AI. In this context, having a standard that facilitates compliance becomes not only useful, but essential.

The European AI Act, in particular, introduces the concept of "transparency" in the use of data for training AI models. Companies will have to document the origin of the data used and demonstrate that they have the necessary rights for its use. RSL fits perfectly into this framework, providing a standardized mechanism for documenting and managing these rights.

The parallel with the GDPR is illuminating. When the European privacy regulation came into force in 2018, many cried catastrophe, predicting the end of the free web. Instead, the GDPR created a new global standard, pushing even non-European companies to adopt more privacy-friendly practices. RSL could follow a similar trajectory: starting as a response to specific regulatory needs and becoming a de facto global standard.

The penalties for violating content rights are becoming increasingly severe. In 2023, several publishers took legal action against AI companies for the unauthorized use of their content. The New York Times sued OpenAI and Microsoft, while other publishers are considering similar actions. In this scenario, RSL could act as a "safe harbor": those who comply with it have greater legal protection than those who completely ignore content licenses.

Regulators are paying more and more attention to these developments. The US Federal Trade Commission has already opened several investigations into the data collection practices of AI companies, while the Italian Competition Authority has initiated similar proceedings. Having a recognized standard like RSL could facilitate dialogue between companies and regulators, creating a shared framework for discussion.

The global perspective is particularly interesting. While Europe tends towards stringent regulation and the United States prefers a more market-driven approach, Asia presents a varied landscape. Countries like Singapore and South Korea are experimenting with "regulatory sandboxes" for AI, where standards like RSL could be tested in controlled environments before wider adoption.

But perhaps the most intriguing aspect is how RSL could evolve beyond its initial purposes. If the system proves effective in managing content rights for AI, it could extend to other areas: from managing rights for multimedia content to defining standards for the ethical use of personal data. It's a bit as if we were witnessing the birth of a new "operating system" for digital rights.

Prospects and Final Considerations

RSL certainly represents a step forward in the direction of a fairer web from the point of view of the distribution of value created by digital content. However, its success will depend on the ability to create an ecosystem where all the main players - publishers, AI companies, and technology intermediaries - find it convenient to participate.

The history of technology is full of promising standards that have failed to reach the critical mass necessary to become truly ubiquitous. RSS itself, despite its usefulness, never became as mainstream as its creators had hoped. RSL will have to avoid this fate, and it can only do so by demonstrating concrete value for all the actors involved.

In an era where artificial intelligence promises to revolutionize every aspect of our digital lives, having tools that allow content creators to maintain control over their intellectual property is not only desirable, it is essential. RSL could be just the right tool at the right time, but as always in the world of technology, only the market will have the final say.

The future will tell if this new standard will succeed in transforming the digital data wild west into a more civilized frontier, where everyone can thrive. In the meantime, publishers and AI companies would do well to keep an eye on this evolution: it could define the rules of the game for decades to come.