- The Renaissance Times
- Posts
- Sakana AI Lets You Merge Foundation Models
Sakana AI Lets You Merge Foundation Models
Dear Artisan,
While this week has been full of big announcements from the biggest players in the space like NVIDIA and Microsoft, there is a little company out of Japan making some of the biggest waves.
Sakana AI has just released technology that allows anyone to combine open source models and merge them to create something even more powerful.
Let’s dive in.
Hot Off The Press
Sakana AI Releases Model Merge to Work with Open Source Models
Elon Musk to Start Blindsight Implant to Cure Blindness
Drake’s AI clone is here and facing backlash
Microsoft releases first AI-focused PCs
Apple looks to partner with Baidu on integrating AI in phones
The One Big Thing
One of the beautiful parts about LLMs is that they make all the connections over vast amounts of data you couldn’t possibly come up with yourself. Where visual learning is still difficult for machines given how much data is needed to process visual information, written information is very data dense, meaning machines can process more in 10 seconds than we will in an entire lifetime.
This also means they are incredible at interpreting code, and coming to conclusions that humans normally could not. Turns out a great application for this is in evaluating and optimizing LLMs.
Sakana AI of Japan released their newest product which does just this - it interprets open source foundation models on Hugging Face and comes up with the most optimal way to merge and fine tune them for specific use cases. In early tests they are outperforming some of the state of the art models out there.
The Basics:
Sakana AI focuses on using nature-inspired ideas for automating foundation model development, aiming to create models for specific domains through evolution.
They've introduced "Evolutionary Model Merge," a method that combines models using evolutionary techniques to find efficient merging strategies, enhancing model capabilities.
Initial tests produced state-of-the-art Japanese Large Language and Vision-Language Models, showing the method's effectiveness in creating highly capable models with less compute.
Their method also extends to Image Generation Diffusion Models, achieving promising results with a Japanese-capable model optimized for fast generation.
Sakana AI's approach demonstrates a shift towards more cost-effective and innovative model development by leveraging the collective intelligence of existing models
this seems like an unusually big deal.
the team at sakana AI figured out a way to take a set of open source models, and use evolutionary methods to automatically "merge" them, and achieve state of the art performance.
you can in principle take the 500K open source models on… twitter.com/i/web/status/1…
— Siqi Chen (@blader)
8:22 PM • Mar 21, 2024
At present, estimates are that it would cost tens of millions of dollars to train a new foundation model. With Sakana, all of that gets flipped on its head. There are hundreds of models open-sourced on Hugging Face, many of which are high-end, production-quality models put out by some of the biggest companies on earth.
This provides a new way to analyze and combine them for purpose-built applications. Most of these applications do not need a new foundation model themselves (nor can they afford one). But they can certainly use AI to help optimize the ones that are out there.
The Gallery
High performant Claude models for cheap
Introducing `claude-opus-to-haiku` ✍️
Get the quality of Claude 3 Opus, at a fraction of the cost and latency.
Give one example of your task, and Claude 3 Opus will teach Haiku (60x cheaper!!) how to do the task perfectly.
And it's open-source: github.com/mshumer/gpt-pr…twitter.com/i/web/status/1…
— Matt Shumer (@mattshumer_)
10:35 PM • Mar 21, 2024
AI and the mirror test
The AI Mirror Test
The "mirror test" is a classic test used to gauge whether animals are self-aware. I devised a version of it to test for self-awareness in multimodal AI. 4 of 5 AI that I tested passed, exhibiting apparent self-awareness as the test unfolded.
In the classic… twitter.com/i/web/status/1…
— Josh Whiton (@joshwhiton)
5:50 PM • Mar 21, 2024
Tools
Must have tools for every artisan to add to their toolkit:
Katalist: A platform to automate script visualization and maintain characters for visual storytelling
Villa: A tool to create VR virtual worlds for team collaboration
Verk: Business Process Automation
Moonshot AI and Kimi Assistant
LlamaGen.AI: Create Comics, Storyboards and Story Pitches
Deep Tech
The newest and coolest in the research world that you need to know about:
Project Devika: The open-source alternative to Devin
SOTA on OpenToM with DSPy
PolymathicAI: foundation models for science
Flops: Training large models requires maximizing flops/GPU
Distil-Whisper v3: > ~50% less parameters and 6x faster than Large-v3
Closing Thought
Running models to evaluate open source models is the future
Work With Us!
The AI Renaissance is coming and we are building the best community of the people making it happen.
Contact us to sponsor your product or brand and reach the exact audience for your needs across our newsletter and podcast network.