NVIDIA Sued Over Copyrighted Works Used in Generative AI

Dear Artisan,

We all knew this moment was coming. From the first time any of us used ChatGPT and it started creating outputs in the style of real (or dead) people, everyone knew this was dicey territory.

Who exactly owns the IP rights - or stylistic resonance - of certain people? Well, it looks like we are about to find out sooner rather than later.

Let’s dive in.

Hot Off The Press

The One Big Thing

IP Rights and AI Are Headed for a Collision Course

What exactly is an LLM? When most of us think of “AI” these days, we are generally referring to large language models like GPT or Claude that we use in our day to day lives. These models take in our queries and help us get the answers we need.

But what is really happening under the hood?

The LLM is a prediction machine that predicts the next word of output based on the input (prompt) you feed it. This prediction comes from a massive swath of training data that the model is fed (usually the entirety of the internet, or whatever parts of it are accessible) and then fine-tuned by an army of humans to provide context and ensure the quality, accuracy, and precision of outputs. This is a long and expensive process; estimates are that it would cost in the billions today to train a new foundation model as performant as the ones we are used to today.

Now, in order to get some of these predictions to be useful, the training data and fine-tuning pieces of this process are incredibly important. Say someone prompts an LLM for “a rap about New York City in the style of Jay-Z.” Well, the LLM needs to have knowledge about New York City and be familiar with the works of the rapper such that it can mimic that style accurately.

Jay-Z’s discography is available on the internet, so the LLM has been trained on all of the works this artist has produced without needing to ask Jay'-Z’s permission.

Now you see where the issue is, right?

Any artist or creator has a reasonable claim that they have not exactly consented to be training data for any LLM. Not for free at the very least. You could argue that a rap album is training data for the next generation of rappers, but at least they have to pay for access to the content. If they don’t it’s piracy, which is a federal crime.

This scenario is the basis of the ongoing suit between the New York Times and OpenAI.

NVIDIA, which recently announced its own generative AI model, is now facing the same scrutiny. Three authors are suing the company for using their books as training data without their consent.

On the same day, a company called Soul Machines faced significant backlash for their new product, “Digital Marilyn,” a chatbot meant to represent Marilyn Monroe.

This issue is not going to go away anytime soon. The basis of every LLM is the training data it has been trained on, and without being trained on the best that humanity has to offer, the model will not be as performant as it could be. But in order to do so it has to borrow (steal) knowledge from the people who have produced it without offering anything in return.

While I understand that this is how human history has worked—we don’t owe anything to the people in history we borrow ideas from—it feels different that it’s done by a machine at mass scale.

This is one of the more interesting questions of the AI age and we are just beginning to grapple with its overall impact. Stay tuned to this one because we will be seeing this issue pop up over and over, with no clear resolution in sight.

Claude 3 Continues to Impress

A short story about a potential AGI future

Tools

Must have tools for every artisan to add to their toolkit:

Deep Tech

The newest and coolest in the research world that you need to know about:

Closing Thought

Be the training data you wish to see in the LLM

This one is a banger

Work With Us!

The AI Renaissance is coming and we are building the best community of the people making it happen.

Contact us to sponsor your product or brand and reach the exact audience for your needs across our newsletter and podcast network.