The Renaissance Times
Posts
Alibaba's Mindblowing New EMO Lets You Generate Video with Audio or Still Images

Alibaba's Mindblowing New EMO Lets You Generate Video with Audio or Still Images

It's so over for Hollywood

The Renaissance Times
February 29, 2024

Welcome Artisan,

You now have the power of a Hollywood studio in your pocket.

As much as I like to hype the disruption of traditionally gated industries, especially those in the creative fields, I may be underplaying just how disruptive the next few years will be. AI video is getting extremely close to its ChatGPT moment, and it’s going to break people’s brains.

Hollywood studios are already canceling $800 million studios over the demos of the past few weeks.

Today, Alibaba enters the game with its newest release, EMO. Emote Portrait Alive for long, this new tool takes a still image and audio as input and creates a high-quality portrait video as output. It looks crazy.

Let’s dive in.

Hot Off The Press

Alibaba announces EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model
Morph Studio just introduced an AI filmmaking platform using its own text-to-video model in partnership with Stability
Anamorph’s releases new product to reorder scenes with AI and create unlimited versions of one film
Adobe unveils Project Music GenAI Control, a platform that can generate audio from text descriptions
Nvidia, Hugging Face and ServiceNow collaborate on StarCoder2, a new family of open-access large language models (LLMs) for generative coding.

The One Big Thing

You are now a major motion picture director

When I talk about the AI Renaissance, it’s usually because of new tools and releases like EMO by Alibaba.

The Renaissance is a time when creation takes a great leap forward, when ideas spring up out of random pockets of the earth, and when new ways of thinking and living take hold because the world sees things they have never been exposed to before. New forms of art and new theories of the world that come from people and groups that previously stayed unheard.

This is where we are headed.

As more models come out that democratize access to the greatest tools we have at our disposal, the flurry of new and radical forms of expression we see will be explosive. What happens when billions of people suddenly have access to the same equipment that an extremely small group of individuals in the Los Angeles region only had access to in the past?

My guess is that the productions and artwork of the future will look very little like those of the past.

One of the greatest exports of American industry in the past century has been culture, with Hollywood being front and center. It’s unclear how much longer that will last, if at all.

Hollywood’s moat historically has been expensive production equipment, access to talent, and (importantly) robust distribution networks that ensure a strong business model centered around the production of high end entertainment. In this new age we are entering, the costs of production equipment are headed towards zero, and access to talent will be open and free (although there is a large discussion to be had around IP and rights licensing that we will skip for today).

The only moat left for this industry is access to distribution, which, in case you haven’t heard, is currently being torn apart by TikTok and YouTube. It’s a precarious future for anyone in that industry.

This process is what economists call creative destruction and is necessary for any large scale economic transition. This is where we are now.

Where we are headed is much more glorious.

Inspiration / Gallery / Artisan Board

Today is EMO Gallery day:

This is EMO: Emote Portrait Alive from Alibaba.
An AI capable of generating emotive facial expressions with accurate lip-syncing, using just a reference image and an audio source.
9 unbelievable demos & link:
1) twitter.com/i/web/status/1…
— Proper 🧐 (@ProperPrompter)
8:01 AM • Feb 28, 2024

Alibaba presents EMO
A method for generating talking/singing head avatars with expressive facial expressions from a single image
Here the woman from the OpenAI Sora video is generated singing "Don't Start Now" by Dua Lipa
10 awesome examples and links below: twitter.com/i/web/status/1…
— Allen T (@Mr_AllenT)
11:59 AM • Feb 28, 2024

— AK (@_akhaliq)
5:10 AM • Feb 28, 2024

Bonus: Rendering protein structures inside cells at the atomic level with Unreal Engine

Ok this paper is bonkers and I love it biorxiv.org/content/10.110…
— Uri Manor 💔 (@manorlaboratory)
6:49 AM • Feb 29, 2024

Tools

Must have tools for every Renaissance creator to add to their toolkit:

SkimAI: AI-driven scheduling and custom workflows for your email inbox.
Andrew Ng’s New Free Course on Llama 2 with Meta
Retrieval-Augmented Generation (RAG) pipeline to find/recommend Kaggle competitions
60 Minute Demo on making a feature length Hollywood film with AI
STORM: Research assistant that can produce 3500-word articles with 20 references
Chatbot that predicts and pre-generates responses

Deep Tech

The newest and coolest in the research world that you need to know about:

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model (Paper)
StarCoder2 technical deep dive
Microsoft: The era of 1-bit LLMs
Meta Reality Labs neuromotor interface results
YODAS from WavLab: 370k hours of weakly labeled speech data across 140 languages
Rendering protein structures inside cells at the atomic level with Unreal Engine

Closing Thought

❝

What will be the Citizen Kane of the AI video era? Who will be the lead actor?

Maybe I should have learned filmmaking. Or maybe it’s the right time now…

Work With Us!

The AI Renaissance is coming and we are building the best community of the people making it happen.

Contact us to sponsor your product or brand and reach the exact audience for your needs across our newsletter and podcast network.