- The Renaissance Times
- Posts
- Anthropic's New Claude Model Claims to Beat GPT-4
Anthropic's New Claude Model Claims to Beat GPT-4
Welcome Artisan,
There is a new model on the block, and its name is Claude.
Well, Claude is obviously not new, but Claude 3 sure is, and its coming in hot, making some heavy claims about its size and performance.
Let’s dive in.
Hot Off The Press
Anthropic Release New Model Claude 3 and claims to beat OpenAI’s GPT-4
ChatGPT can now read responses to you on mobile or web
AI tax prep chatbots are giving bad advice
Multiverse raises $27M for quantum software targeting LLM use
Alibaba continues AI push with $600 million investment in MiniMax
AMD's AI Chips Are Too Powerful to Sell to China Without License from Commerce Department
The One Big Thing
GPT-4 Is No Longer the Top Model…or so its competitors claim
The world we live in is such that when a new foundation model comes out, we do not study its merits by itself but rather compare its performance to GPT-4 and decide whether or not its worth our time to check it out.
With its Clause 3 release, Anthropic front ran the conversation and boldly came out and claimed that Opus, its largest and most performant model, is already ahead of GPT-4 and Google Gemini in every single relevant metric (seen above). Here’s everything we know:
The Basics:
Launch of three models: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus
Models vary by intelligence, speed, and cost, catering to different user needs.
Long context understanding and near-perfect recall for handling extensive information (similar to Gemini)
Advanced in adhering to brand voice and complex instructions, many leaps in preventing hallucinations
Opus and Sonnet available for use, with Haiku set to launch soon.
The thing about Large Language Models is they are black boxes. Unless they are open source models (which we heavily advocate for here) we do not get to see how they work. This is to say that we do not have accurate performance testing for Claude 3 yet, so there is not a completely accurate way to test whether or not their claims are true.
That has not stopped people from trying. And the results are… interesting, to say the least. Early reviews seem to show that the Claude 3 models are actually more expensive than advertised, and potentially trained on specific data sets that make them display better performance initially.
It also seems like Anthropic is not quite making an apples to apples comparison here:
Somewhat interesting advertising choice from Anthropic, comparing their newly released Claude 3 to GPT-4 on release (March 2023).
According to Promptbase's benchmarking, GPT-4-turbo scores better than Claude 3 on every benchmark where we can make a direct comparison. twitter.com/i/web/status/1…
— Tolga Bilge (@TolgaBilge_)
8:45 PM • Mar 4, 2024
On the other hand it’s already solving PhD level problems which were previously unsolvable:
Claude 3 gets ~60% accuracy on GPQA. It's hard for me to understate how hard these questions are—literal PhDs (in different domains from the questions) with access to the internet get 34%.
PhDs *in the same domain* (also with internet access!) get 65% - 75% accuracy.
— david rein (@idavidrein)
3:34 PM • Mar 4, 2024
The main takeaway here is the competition is good. The more options people have to choose from will force the models to get better, faster, and cheaper. The more tooling there is to test model performance, the more accountable these companies will be in their releases.
While we very much wish these models were all open sourced from the get-go, the fact that there are new players in town to put pressure on OpenAI and Google (lol) means that the Great AI Arms Race of this century is starting to heat up.
The Gallery
Stability AI Releases a 3D Modeling Tool
Today we are releasing TripoSR in collaboration with @tripoai. TripoSR is a new image-to-3D model capable of creating high quality outputs in less than a second.
Learn more here: bit.ly/3P7bNMn
— Stability AI (@StabilityAI)
11:01 PM • Mar 4, 2024
Congrats, you can now have a chatbot of a dead loved one
Tools
Must have tools for every Renaissance creator to add to their toolkit:
TripoSR: A new image-to-3D model capable of creating high quality outputs in less than a second
Zero-shot Audio Editing using DDPM Inversion
Rehearsal: LLM-empowered virtual partner to help you practice conflict resolution skills with tailored feedback
Free course to learn to make games with AI for Unity
Course: Vector Databases: from Embeddings to Applications
LaVague: Fully open-source AI pipeline to turn natural language into browser actions
Deep Tech
The newest and coolest in the research world that you need to know about:
Claude 3 takes on the Tokenization book chapter challenge
Orca-Math: Mistral-7B offshoot excelling in math word problems
DolphinCoder-StarCoder2-15b: Lots of coding knowledge and no laziness
Paper: Dialectic Prejudice Predicts AI Decisions About People’s Character
Closing Thought
Large context models are a really crazy thing that we haven’t wrapped our heads around yet
Work With Us!
The AI Renaissance is coming and we are building the best community of the people making it happen.
Contact us to sponsor your product or brand and reach the exact audience for your needs across our newsletter and podcast network.