The Renaissance Times
Posts
Stable Diffusion 3 is Ready to Challenge Sora

Stable Diffusion 3 is Ready to Challenge Sora

The Renaissance Times
February 23, 2024

Welcome Renaissance Creator,

Can you feel the acceleration?

It seems like every day now we are treated to a major announcement or release that, just a year ago, would have changed the industry altogether.

Today’s game changing announcement is Stable Diffusion 3. SD3 for short is yet another open source model for text to image generation (and maybe much more).

Let’s dive in.

Hot Off The Press

Stable Diffusion 3.0 announces new model using diffusion transformation architecture
Chrome gets a built-in AI writing tool powered by Gemini
DatologyAI is building tech to automatically curate AI training datasets
Google pauses AI tool Gemini’s ability to generate images of people after historical inaccuracies
The great debate: Increased context windows vs. RAG
Reddit is selling user data to Google for AI model training
Eleven Labs joins Disney Accelerator program
Figma and Replit announce an integration to let you go from design to code in a few clicks

The One Big Thing

Stable Diffusion 3 is Here to Challenge OpenAI’s Sora

If 2023 was the year of generative AI bursting on the scene for text generation, 2024 is shaping up to do the same for photo and video.

Stability AI, one of the pioneering companies in creating text-to-image models, along with a leader in the open source AI movement, has released their newest text to image model: Stable Diffusion 3.

SD3 for short (great nickname, by the way—could be a rap album) uses a completely new architecture to support its model than SD1 and 2. From Stability AI CEO Emad Mostaque: “Stable Diffusion 3 is a diffusion transformer, a new type of architecture similar to the one used in the recent OpenAI Sora model.”

WTF is a Diffusion Transformer?

A Diffusion Transformer is a new technology that mixes diffusion models (which gradually turn random noise into an image) with transformers (the brains behind chatbots— understanding and generating language).

This combo lets the model create super detailed and accurate images from text descriptions. Think of it as a super-smart artist that starts with a messy sketch and then refines it into a masterpiece, all while understanding exactly what you're asking for, like a mix of a painter and a poet in the digital world.

Why Does it Matter?

Why it matters is pretty cool: Diffusion Transformers are changing the game in creating images and videos from just words. This means artists, designers, and even regular folks can bring their wildest ideas to life without needing to be pros at drawing or editing. It's like having a magic wand that turns your words into visual reality, making creativity limitless. For industries like gaming, movies, and advertising, this is huge. Imagine designing characters or scenes by just describing them.

It's making high-quality creative work more accessible and faster to produce, opening up new possibilities for storytelling and content creation.

The bedrock of a new Renaissance.

Artisan Gallery

Gen Z is already better and more productive than you at AI

Gen-z is now using deepfakes to teach each other calculus 🤣 @OnlockLearning 👏
— nisten (@nisten)
7:15 PM • Feb 22, 2024

Futuristic city based on Indian architecture

Gemini Can Watch Screen Recordings and Start Automating Your Tasks

🤯 Mind officially blown:
I recorded a screen capture of a task (looking for an apartment on Zillow). Gemini was able to generate Selenium code to replicate that task, and described everything I did step-by-step.
It even caught that my threshold was set to $3K, even though I… twitter.com/i/web/status/1…
— 👩‍💻 Paige Bailey (@DynamicWebPaige)
5:29 AM • Feb 22, 2024

…and analyze earnings reports

AI video input is crazy.
Give Gemini 1.5 Pro a recording of your browsing session and it can summarize it.
I gave it video of me reading 8 articles about Nvidia’s earnings report and it generated a comprehensive breakdown.
Insane progress.
— Mckay Wrigley (@mckaywrigley)
3:26 PM • Feb 22, 2024

Tools

Must have tools for every Renaissance creator to add to their toolkit:

Khanmigo: AI-powered tutor from the founder of Khan Academy
Eleven Labs GPT: Custom GPT that brings any prompt to life with a realistic voice
Andrej Karpathy created a tool to generate written companion guides from long form content
Supermaven: A code completion tool with a context window of 100,000 tokens
SD3: Stable Diffusion’s newest model, which could possibly support 3D renders
Glif browser extension allows you to edit images with AI from your browser
Wokelo: Hyper-accelerated research and due diligence powered by Gen-AI

Deep Tech

The newest and coolest in the research world that you need to know about:

Divide-and-Conquer Dynamics in AI-Driven Disempowerment
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
What happens when RAG models are fed conflicting information?
FireFunction V1: A new, open-weights function calling model: GPT-4-level structured output and decision-routing at 4x lower latency
Gradio releases update fixing security vulnerability
LGM Mini: Image to Interactive 3D in 5 seconds
Optimal model parameters for open source LLMs

Closing Thought

❝

The value of brands will go through the roof in the AI age. What are the most popular AI generated videos? Balenciaga remixes and Will Smith eating pasta. In a world of infinite content, the content relating to people you recognize is infinitely more valuable.

Me questioning value and existence in a post AGI world

Work With Us!

The AI Renaissance is coming and we are building the best community of the people making it happen.

Contact us to sponsor your product or brand and reach the exact audience for your needs across our newsletter and podcast network.