FLUX - the Successor of Stable Diffusion ⋆ Code A Star

Spread the love

In 2023, we introduced Stable Diffusion [1] [2] to our readers. Since then, things have been changing a lot. The technology behind image generation continues to improve day after day. So right now, let us introduce the supernova in the AI image generation model family, FLUX.

What the is FLUX?

Although we call it supernova, FLUX is actually nothing new. It is the child of our beloved Stable Diffusion. Then what has happened with Stable Diffusion? The company behind it, Stability AI, has faced financial instability. The core development team then left the company and founded the Black Forest Labs. And FLUX is their baby of AI image generation model. Performance wise, FLUX is superior to the children of its predecessor and other image generation models on the markets.

FLUX ELO score — (image source: https://blackforestlabs.ai/announcing-black-forest-labs/)

Financial wise, Black Forest Labs has already raised $31 million (USD) in funding. And the startup is aiming to raise $100 million. Beside the fundraising part, FLUX has been integrated into Elon Musk’s Grok for AI image generation. All signs point to a promising future for FLUX. Okay, the history lesson is up. Time for us to learn how to use it for ourselves.

Use of FLUX

For longtime CodeAStar readers, you may know that there is one thing I concern most for any software tools/platforms/frameworks. The ease of use. And FLUX likes its predecessor, allows us to generate images simply through a web UI. We spent our good old Stable Diffusion time with AUTOMATIC1111’s WebUI (A1111). But this time, A1111 does not support FLUX, we then switch to a performance-oriented web UI, Forge.

First things first, go to Forge’s GitHub repository to install Forge on your machine. Since Forge is built on A1111, there should be no issue for A1111 users to use it. For new users, don’t worry, Forge is straightforward to use and has a gentle learning curve.

Follow the instructions on Forge’s GitHub page to download the version suitable for your platform. Once you run it, you should see the following screen in your browser (yes, Forge is a web app).

Yes, it just looks like our good old A1111. Please note that we currently only have the UI, we still need the FLUX model to perform the magic. Other than their flagship commercial model FLUX [pro], we can utilize the 12 billion parameters FLUX [dev] model. This is available for free for non-commercial use. We can get the official model from the Hugging Face website. Then we have another thing to consider, there are many variation models, which one should we get? We are using Forge as our interface. I recommend using the model provided by Forge’s author. You can visit the author’s GitHub page for more detail. Once you got the model, put the file under the following location:

<your Forge UI path>\webui\models\Stable-diffusion\

On your Forge UI, select the “flux” at the top and click refresh button next to the checkpoint drop-down menu. Select the model you have downloaded, then our minimal FLUX setup is done!

The outputs of FLUX

Let’s start our first test using the same prompt we used in 2023 for Stable Diffusion WebUI. Enter the following prompt in the Txt2img input box and click “Generate“:

“A brown bear uses a computer in an office”

SD vs FLUX -1 — (Left) Stable Diffusion v1.5, (Right) FLUX.1 [dev]

Obviously, the graphics in FLUX is an upgrade with higher detail. I would say that the smartness of FLUX can also be its flaw. It logically used a cartoon bear for “uses a computer in an office”. While the 2023 version was more creative and fun to watch for me :]].

Let’ see another example.

“Keanu Reeves, park bench, sitting, eating. (taco:1.3), (sad), water color”

Keanu 2 — (Left) Stable Diffusion v1.5, (Right) FLUX.1 [dev]

This one is a clean win. The only flax of the FLUX output is the missing fillings in the taco.

In the past, we trained several Stable Diffusion models for different art styles. Then we have to switch between models for various styles. Finally it became history, FLUX can generate images as good as those specifically trained models. In the 2023 examples, we used 5 checkpoint models to generate images using the following prompt:

“woman, outdoor, half body, side view, busy street, (winter), 4k”

And now we use the same prompt in FLUX plus mentioning the art style (e.g. adding Comic style illustration), then we got:

SD vs FLUX in art styles — (Top) Outputs from 5 checkpoint models, (Bottom) Outputs from the same FLUX model

Things have become easier and better.

Other than the graphical upgrade, FLUX is good for generating image with text. In case you don’t know, this feature was challenging in the old Stable Diffusion days. Here is the prompt we used for generating image with text:

“In a cyber punk city, a neon sign displaying ‘CodeAStar‘ text, attached to a building“

What’s next for AI image generation?

We have seen how AI image generation has evolved since last year. It is good to see the growth of technology bringing new and better things to society. But it also raises certain concerns, such as the lack of true creativity and the demotivation for content creators. Somehow it is inevitable, and the same thing will happen in the programming industry as well. My suggestion is, keep learning and creating by using the advantages of new technologies. By integrating AI tools into our creative processes, we can enhance our work while still expressing our unique artistic styles.

FLUX – the Successor of Stable Diffusion

What the is FLUX?

Use of FLUX

The outputs of FLUX

What’s next for AI image generation?

Related

1 thought on “FLUX – the Successor of Stable Diffusion”