- The Renaissance Times
- Posts
- NVIDIA Sued Over Copyrighted Works Used in Generative AI
NVIDIA Sued Over Copyrighted Works Used in Generative AI
Dear Artisan,
We all knew this moment was coming. From the first time any of us used ChatGPT and it started creating outputs in the style of real (or dead) people, everyone knew this was dicey territory.
Who exactly owns the IP rights - or stylistic resonance - of certain people? Well, it looks like we are about to find out sooner rather than later.
Let’s dive in.
Hot Off The Press
Nvidia is being sued by three authors over generative AI’s use of their copyrighted works
Americans still don’t trust AI to give medical advice
Soul Machines is facing heat for “Digital Marilyn,” an AI chatbot designed to look and talk just like Marilyn Monroe
Sam Altman is back on the OpenAI board
Florida teens arrested for creating ‘deepfake’ AI nude images of classmates
The One Big Thing
IP Rights and AI Are Headed for a Collision Course
What exactly is an LLM? When most of us think of “AI” these days, we are generally referring to large language models like GPT or Claude that we use in our day to day lives. These models take in our queries and help us get the answers we need.
But what is really happening under the hood?
The LLM is a prediction machine that predicts the next word of output based on the input (prompt) you feed it. This prediction comes from a massive swath of training data that the model is fed (usually the entirety of the internet, or whatever parts of it are accessible) and then fine-tuned by an army of humans to provide context and ensure the quality, accuracy, and precision of outputs. This is a long and expensive process; estimates are that it would cost in the billions today to train a new foundation model as performant as the ones we are used to today.
Now, in order to get some of these predictions to be useful, the training data and fine-tuning pieces of this process are incredibly important. Say someone prompts an LLM for “a rap about New York City in the style of Jay-Z.” Well, the LLM needs to have knowledge about New York City and be familiar with the works of the rapper such that it can mimic that style accurately.
Jay-Z’s discography is available on the internet, so the LLM has been trained on all of the works this artist has produced without needing to ask Jay'-Z’s permission.
Now you see where the issue is, right?
Any artist or creator has a reasonable claim that they have not exactly consented to be training data for any LLM. Not for free at the very least. You could argue that a rap album is training data for the next generation of rappers, but at least they have to pay for access to the content. If they don’t it’s piracy, which is a federal crime.
This scenario is the basis of the ongoing suit between the New York Times and OpenAI.
NVIDIA, which recently announced its own generative AI model, is now facing the same scrutiny. Three authors are suing the company for using their books as training data without their consent.
On the same day, a company called Soul Machines faced significant backlash for their new product, “Digital Marilyn,” a chatbot meant to represent Marilyn Monroe.
This issue is not going to go away anytime soon. The basis of every LLM is the training data it has been trained on, and without being trained on the best that humanity has to offer, the model will not be as performant as it could be. But in order to do so it has to borrow (steal) knowledge from the people who have produced it without offering anything in return.
While I understand that this is how human history has worked—we don’t owe anything to the people in history we borrow ideas from—it feels different that it’s done by a machine at mass scale.
This is one of the more interesting questions of the AI age and we are just beginning to grapple with its overall impact. Stay tuned to this one because we will be seeing this issue pop up over and over, with no clear resolution in sight.
The Gallery
Claude 3 Continues to Impress
Claude 3 is really amazing.
Are there more than a dozen humans who could have given a better answer?
infoproc.blogspot.com/2017/09/phase-…
— steve hsu (@hsu_steve)
2:28 AM • Mar 10, 2024
A short story about a potential AGI future
New story, inspired by my favorite Dostoevsky novel (link in next tweet).
— Richard Ngo (@RichardMCNgo)
3:59 AM • Mar 10, 2024
Tools
Must have tools for every artisan to add to their toolkit:
Custom Microsoft Copilot GPTs: Create your own GPTs and share them
Sound Effects on Pika: Generate and integrate sound into your videos
OneCliq: AI content assistant with personalized and data-driven suggestions
Top 10 Alternatives to OpenAI’s Sora
ProciGen and HDM: Template-free reconstruction of human-object interaction with procedural interaction generation
Deep Tech
The newest and coolest in the research world that you need to know about:
AI Is Learning What It Means to Be Alive
MajorTOM: Terrestrial Observation Metaset, the largest community-oriented and machine-learning-ready collection of images covering over 50% of Earth's surface
On the Societal Impact of Open Foundation Models
Building a RAG application from scratch using Python, LangChain, and the OpenAI API
Stanford AI Lab: Silicon Valley is pricing academics out of AI
Insilico Medicine unveils first AI-generated and AI-discovered drug in new paper
Closing Thought
Be the training data you wish to see in the LLM
Work With Us!
The AI Renaissance is coming and we are building the best community of the people making it happen.
Contact us to sponsor your product or brand and reach the exact audience for your needs across our newsletter and podcast network.