What the NYT Case Against OpenAI, Microsoft Could Mean for AI and Its Users
The latest battle over the way artificial intelligence uses copyrighted material could shape the future of the industry.
At a Glance
- ChatGPT and OpenAI appear near-constantly at the heart of AI debates over the technology’s promises and its controversies.
- In April 2023, the New York Times reached out to Microsoft and OpenAI to discuss intellectual property issues.
- Many other copyright holders challenged OpenAI’s use of their material to train its chatbot before NYT joined the fray.
OpenAI is at the center of several legal battles focused on the use of copyrighted materials. In December 2023, the New York Times sued OpenAI and Microsoft; Microsoft has billions invested in OpenAI. The lawsuit argues that the companies used millions of its articles to train chatbots (OpenAI’s ChatGPT and Microsoft’s Copilot) that compete with the publication.
“Settled copyright law protects our journalism and content. If Microsoft and OpenAI want to use our work for commercial purposes, the law requires that they first obtain our permission. They have not done so,” according to an emailed statement from The New York Times.
While this is not the first case of this nature to challenge the way AI companies train their large language models (LLMs), it has the potential to set a precedent for the relationship between copyright holders and AI companies.
“The New York Times has been part of important free speech and copyright cases in the past,” Cobun Zweifel-Keegan, managing director of the International Association of Privacy Professionals (IAPP), Washington, D.C., tells InformationWeek. “And so, I think just their involvement in the case … ups the ante of the types of scrutiny that OpenAI has faced for the use of materials for generative AI purposes.”
It will take time for this case to come to its conclusion either through settlement or a court ruling, but it raises important questions for AI system developers and their users. What should CIOs and other enterprise leaders be thinking about as this case, and others, shape the legal outlook for AI?
The Legal Outlook for OpenAI
OpenAI is not the only AI company to face these kinds of legal challenges, but in many ways, its market-leading position has made it the face of its burgeoning industry. ChatGPT and OpenAI are almost always at the heart of AI debates over the technology’s promises and its controversies. The lawsuits OpenAI faces could signal widespread changes for the AI industry as a whole.
In April 2023, the New York Times reached out to Microsoft and OpenAI to discuss intellectual property issues. Before the lawsuit was filed, the publication explored “…the possibility of an amicable resolution, with commercial terms and technological guardrails that would allow a mutually beneficial value exchange between Defendants and The Times,” according to the filed complaint.
“It's clear that they just couldn't come to an agreement on the number,” says Zweifel-Keegan.
Many other copyright holders challenged OpenAI’s use of their material to train its chatbot before the New York Times joined the fray. More than a dozen major authors banded together to file a lawsuit against OpenAI over its use of their books to train ChatGPT, the New York Times reported in September 2023. A month later, several nonfiction authors filed a proposed class action lawsuit against OpenAI and Microsoft, outlining similar complaints.
These lawsuits are likely to be followed by others as AI investment and development continues to steam ahead. “I do think that the incentives are there for people to bring lawsuits against OpenAI due to their very high valuation and their very high profit margins,” says Joseph Thacker, principal AI engineer and security researcher at SaaS security company AppOmni.
But solutions may emerge outside of the courtroom. A statement from an OpenAI spokeswoman indicated that the company is working with other publishers, according to the New York Times article on its lawsuit. In July 2023, the Associated Press reported it had made a deal to license its archive of news stories to OpenAI.
As intellectual property disputes mount, AI companies may opt to purchase data before putting it to work to avoid lawsuits. “That might start to emerge as the best practice, whether or not the law catches up,” says Zweifel-Keegan.
The Potential Impact of the Lawsuit
Many proponents of AI argue that challenges and potential regulation will hamper innovation, but lawsuits and new laws are likely to emerge. The New York Times case and similar lawsuits signal that copyright law may need to be updated to reflect the reality of a new technology poised to become as ubiquitous as AI.
“We just don't have law on how training AI systems should be treated vis-à-vis copyright law,” says Sekou Campbell, a media and technology attorney and partner at full-service law firm Culhane Meadows.
New technology has driven major revisions to copyright law in the past. “In 1909, it was the radio. In 1976, it was broadcast television,” says Campbell.
For now, AI companies and copyright holders have to make their arguments using existing statutes and court decisions.
For example, the fair use and transformative use legal doctrines could be key to AI companies’ defense against copyright lawsuits. Fair use allows the use of copyrighted work in certain circumstances. “Transformative uses are those that add something new, with a further purpose or different character, and do not substitute for the original use of the work,” according to the US Copyright Office.
The New York Times argues that fair use and transformative use do not apply. “Because the outputs of Defendants’ GenAI models compete with and closely mimic the inputs used to train them, copying Times works for that purpose is not fair use,” according to the publication’s complaint.
It is possible that the New York Times case, or another like it, will make its way to the Supreme Court. And a major court decision could help lawmakers move forward with a revision to copyright law. “I think this particular case may give Congress some information,” says Campbell. “It's useful for lawmakers to have a court opinion to look at to say, ‘Okay, here are the issues that were litigated and are important to both sides. How do we reconcile them in a statute?’”
A court ruling in favor of The New York Times could indicate that licensing is the path forward for AI companies that want to use publishers’ news articles.
“Even if a court does rule in favor of OpenAI that certain uses are fair use, it might not apply to every single use,” says Kristin Grant, managing attorney of intellectual property law firm Grant Attorneys at Law. “So, there might still be a need at a future date to obtain licenses depending on the type of uses being made by these companies.”
Licensing deals are already being struck, like the one between AP and OpenAI. Apple is also actively pursuing deals with news publishers to use their material in the training of generative AI systems, according to The New York Times.
But how much will these licensing agreements cost? The New York Times is seeking “billions of dollars in statutory and actual damage” in its lawsuit against OpenAI and Microsoft. The Information reported that OpenAI has made offers to license news articles from media companies ranging from $1 million to $5 million per year.
“I think we will start to see how expensive that is and what that looks like in terms of building out a playbook for acquiring legal rights to the use of copyrighted or other types of protected works for training generative AI systems,” says Zweifel-Keegan.
Licensing can solve many copyright issues. AI companies with smaller models can easily reach out to copyright holders to license the specific data needed for training, but that issuer becomes stickier for the larger, foundational LLMs. “You can't license the entire internet,” Campbell points out.
"You can't license the entire internet." --Sekou Campbell, partner, Culhane Meadows
While licensing could become a legal reality for AI companies to consider when training their models, questions remain in the interim. What could become of models deemed to be in violation of copyright laws? The New York Times lawsuit calls for the destruction “…of all GPT or other LLM models and training sets that incorporate Times Works,” according to the complaint.
Courts could require companies to delete or retrain models if the data is deemed to be used inappropriately. This enforcement tool is known as algorithm disgorgement or model deletion, and the Federal Trade Commission (FTC) has used it in five cases against tech companies since 2019, CyberScoop reports. While privacy and consumer protection were largely the focus of these cases, it is possible that the use of this tool could expand.
Courts could require companies to delete or retrain models if the data is deemed to be used inappropriately.
“You could see the same sort of outcome happening as a settlement in an IP action or in a regulatory action that’s focused on the misuse of materials,” says Zweifel-Keegan.
Considerations for Users
OpenAI and Microsoft are the defendants in the New York Times lawsuit, but the case raises questions about not only AI systems’ input but also their outputs. What kind of legal risk could AI system users face?
“The New York Times was able to get the system, through very careful prompt engineering, to generate entire New York Times articles. That is showing that when deployed, the system is capable of potentially violating copyright rule. That means the deployer could potentially be liable as well,” says Zweifel-Keegan.
In September 2023, Microsoft announced its Copyright Commitment. It will provide intellectual property indemnity to its commercial Copilot customers. OpenAI also offers Copyright Shield, indemnification for copyright infringement that applies to ChatGPT Enterprise and its developer platform.
As debates about the ethics and use of AI continue, many agree that there is responsibility for bad outcomes on the side of the developer and deployer. Could that assumption of shared responsibility spill out into the intellectual property arena?
“[It] would not be surprising to see future litigation that also tries to reach deployers who have used other people’s systems, especially if they haven’t done their own due diligence to make sure that there are those checks and balances in place,” says Zweifel-Keegan.
Perhaps indemnification clauses will mean that an AI system developer will take on the legal costs resulting from a copyright infringement case, but the user will need to meet the terms and conditions of those clauses.
Indemnification offers AI system users a layer of protection from legal exposure, but that doesn’t mean intellectual property issues are safe to ignore.
CIOs and other leaders have to consider their enterprises’ use cases for AI and the potential risks. Is there a process in place, or are employees using AI tools on their own initiative? Is an AI model spitting out content that violates copyright law? “If AI-generated content is put into otherwise protected content like a blog post … it threatens potentially the entire blog post,” Campbell explains. “It's high-risk in the sense that you may not be able to gain protection for whatever the output is.”
Validating that content is original using tools like plagiarism detectors could help mitigate that risk. “The No. 1 thing will be to be sure that their [enterprises’] content is original and that they're validating that before they publish content,” says AppOmni’s Thacker.
Read more about:
RegulationAbout the Author
You May Also Like