Anthropic Acquittal
A recent ruling by a US California District judge in favor of Anthropic AI (pvt), the developer of the ‘Claude’ LLM models, could change the way in which models are trained, leaning more to the ‘fair use’ side of Section 107 of the US Copyright Act. In order to understand the basics of the decision, here are a few facts about Section 107:
- The section discusses the limited use of copyrighted material without explicit permission under certain circumstances, trying to balance the exclusive rights of copyright owners with the public interest in free expression, creativity, and information availability.
- These are not clear cut rules with easily followed formulas and as they are fact specific, they need to be evaluated on a case-by-case basis.
- On a more general basis there are some instances where copyrighted material ‘May’ be considered ‘fair use’, meaning no permission needs to be granted, but each instance is subjective. They are:
- Criticism
- Comment
- News reporting
- Teaching/Scholarship
- Research
- Parody
The nature of the work is also a big factor. Is it factual or creative, with factual works having less fair use protection (Facts are not copyrightable) as is whether the work is published or unpublished, as original authors are typically granted the right to be the first publisher of their works. The amount of the copyrighted work to be used is also a factor and while there are no specific rules, less use gets you more fair use leeway, especially relative to the size of the work, and less use of the key points or heart of the copyrighted material also gets more fair use latitude.
The last major factor is the tricky one. What is the effect of the use on the value of the copyrighted work or the value of the market overall? Will the use of the work be a substitute for the work itself and therefore reduce the value to the author? This includes the current and future markets and is a tacit test to see how directly this potential use competes with the demand for the original (copyrighted) version. The more ‘transformative’ the use, the less competitive it is considered.
While we point to a number of key factors, this is a very nuanced topic and has been argued in the courts ad infinitum, so regardless of the ruling we show below, every lawyer worth his or her salt will argue that whatever precedent that has been set, does not apply in this case, and the facts must be evaluated specifically on their own merits, leaving definitive rulings to future generations.
Here is our summary of the ruling, as short and sweet as possible:
The case is a dispute between several authors and Anthropic AI, the developer of the Claude LLM model family. The authors believe that Anthropic used their works to train their AI models without the permission necessary as they are copyrighted works. The authors allege that Anthropic pirated some of the works, purchased others and scanned the pages into a ‘research library’. The models were trained on various subsets of the pirated works and the library copies.
The judge ruled in Anthropic’s favor citing that the use of the works to train the LLMs was a ‘transformative’ event and qualified the use as a ‘fair use’. The judge pointed to the fact that none of the works were ever provided to users and that statistically mapping relationships between text fragments is fundamentally different from copying. He cited the fact that humans read and learn from copyrighted material, allowing them to create new works, just as an LLM might ‘read’ the text relationships and create something new. The judge also allowed Anthropic’s book digitization, citing that it was a digital replacement of books already purchased or licensed, not the creation of new works or additional copies.
That said, the judge did not allow Anthropic to use pirated copies for its digital library as the use of the works for a ‘central library’ was not a fair use, while the training was. Anthropic made the case that pirating the works was ‘reasonably necessary’ for training the LLMs but was unable or unwilling to show which specific works were used in training as opposed to those put in the library. The judge did not buy that argument.
All in, while the headlines might rack the ruling up as a victory for the AI developers, it will more than likely be appealed and continue through the courts for years. With many other cases pending, there will be no shortage of rulings for both sides until there is enough specific case law that the courts don’t have to cite fair use cases from the days before there were AIs. The question remains however, what is to stop anyone from using copyrighted material to train an LLM when the US is in a race to develop AI models faster than our rivals? In this case there are no indigenous tribes that have to be ‘relocated’ to mine the land or drill the oil, just a few writers who won’t be able to cash in on the AI boom as easily as they thought. How is that going to hurt anybody?
It won’t unless you happen to be the author who spent years writing with the hope that his/her works might mean something to somebody. It won’t unless you composed a unique melody that could become a song we remember significant life events by, and it won’t unless you happen to be the painter who trained for years to be able to tint the sky in a way never done before. All it takes is “Hey Gemini, create a song just like that one that I just played and post it to my Insta account” and the minutes , hours, days, and weeks that went into the creation of that original song you just heard is now a set of millions of relationships. Those relationships can be used to create a song just a bit different from the one you just heard, in a few seconds. Some folks say that’s a plus, we don’t.



RSS Feed