Plain and simple: GUI tools for Stable Diffusion on the PC

Greetings, In this article, we are going to look at what we think are the best tools for generating images using Stable Diffusion.

It should be noted that DALL-E 3 and Midjourney are commercial cloud services. That is, to use them you need to buy a subscription and generate images on remote servers. Stable Diffusion, on the other hand, has a number of advantages.

First, it is an open source solution. Models with source code are freely available. They can be downloaded and used by anyone for free.

Secondly, with Stable Diffusion you can generate images directly on your personal computer or laptop. This is very convenient – you don’t need to send data somewhere else and pay for the use of other people’s servers.

Of course, Stable Diffusion requires a high performance video card. But if you have one, you will be able to create digital art at home.

In this article, we will take a detailed look at the simple and straightforward GUI tools that allow you to effectively manage Stable Diffusion on your PC. Let’s go!

Fooocus

Fooocus was designed with a focus on simplicity and ease of use. Its creators were inspired by Midjourney, aiming to make the generation process as easy as possible for the user.

Unlike Stable Diffusion, where you have to manually adjust many parameters, in Fooocus all the complicated technical details are hidden. You don’t need to understand the subtleties – just enter a prompt and the system will create a beautiful image.

One of the key advantages of Fooocus is its GPT-2-based text query engine. It can “read between the lines” and generate amazing results even from short ambiguous prompts like “House in garden”.

In addition to text, Fooocus allows you to use images as input data (img2img). Various modes are supported:

Upscale/Variation – enhancement and variations of the original image
Inpaint/Outpaint – draw areas inside/outside the picture
Image Prompt – generation of a new image based on the semantics of the source image

The tool also includes functions for working with styles, quality, dimensions, negative prompts and much more. With 1 click you can add or remove a style to generate.

In addition to 1-click styles, you can include pre-made presets using different Stable Diffusion checkpoints (A pre-trained model of the neural network weights used by Stable Diffusion for generation).

Installation on the device will also be easy – on the project page in GitHub there is a separate button to download the zip archive, you will only have to unzip it, run the run.bat file, wait for the first model to load for the preset initial and the program is ready to work.

You can download the program at this link:

GitHub - lllyasviel/Fooocus: Focus on prompting and generating

Focus on prompting and generating. Contribute to lllyasviel/Fooocus development by creating an account on GitHub.

github.com

Overall, Fooocus is a powerful and very attractive tool for beginners. Thanks to its simple and intuitive interface, the user does not need to make complicated settings or learn additional information.

InvokeAI

InvokeAI is a really interesting and feature-rich tool for working with Stable Diffusion on PC.

On the one hand, it is distinguished by a modern and very convenient user interface. The process of image generation is maximally simplified, which allows even novice users to quickly learn the basics of working with this tool.

However, this convenience hides a very powerful functionality aimed at solving really serious creative tasks. For example, InvokeAI has a built-in Unified Canvas that combines all the main generation modes of Stable Diffusion.

On this canvas, the artist is free to combine text-to-image, image-to-image, inpainting, outpainting and other techniques into one seamless process. The possibilities are limited only by your imagination!

In addition, InvokeAI has a built-in node-based workflow system. This allows you to create flexible generation pipelines tailored to a wide variety of requirements.

However, you should keep in mind that for all its capabilities InvokeAI requires quite serious computational resources and generates images somewhat slower than more lightweight tools like Fooocus.

Working in InvokeAI, you also get access to advanced tools: upscaling, working with styles and ^embeddings1, model management and more. Plus, an integrated gallery for storing and remixing content. InvokeAI offers many other professional tools:.

Advanced download and model management manager
Support for various neural network architectures (including SD XL)
Tools for remastering and upscaling images
A system for dealing with special embeddings, ^LoRA2, etc.
Organized gallery for storing and managing art projects

To download, go to Releases, download the latest version and follow the instructions in the repository. The installation is not as easy as in Fooocus, but everyone will figure it out:)

GitHub - invoke-ai/InvokeAI: Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.

Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products. - invoke-ai/InvokeAI

github.com

InvokeAI can truly be an ideal working environment for professional artists and studios doing high-level AI art. This tool is designed primarily for the creative and commercial markets.

Stable Diffusion WebUI Forge

The next entrant in our review is Stable Diffusion WebUI Forge.

Forge is a powerful extension for the original Web UI by Automatic1111 (One of the first and most popular interface for Stable Diffusion). The main goal of Forge is to optimize resource usage and speed up the image generation process on local computer as much as possible.

If you have a video card with a modest 6-8 GB of memory, Forge can increase the speed of Stable Diffusion by 30-75%! At the same time, the peak consumption of video memory ³ will be reduced by 700 – 1500 GB.

Moreover, with Forge you will be able to generate images in a resolution that exceeds the original capabilities by 2-3 times. Also, the maximum size of the generated batches is significantly increased – 4-6 times larger than in the normal mode.

Important! Forge demonstrates such impressive results due to serious resource optimizations. That’s why on powerful graphics cards like RTX 4090 the performance gain will not be so noticeable.

But the speed improvement is only the tip of the iceberg in Forge’s capabilities. Its main “feature” is the revolutionary Unet Patcher. This is a mechanism that allows developers to easily create extensions and introduce new methods to improve generation quality.

Thanks to Unet Patcher, complex techniques like ^FreeU4, ^HyperTile5, ^ControlNet6 and others can be implemented in just 100 lines of code! This means that with the growing popularity of Forge, we will see a real boom of advanced AI algorithms for Stable Diffusion.

In addition, Forge already integrates new ^samplers7, support for Stable Video Diffusion for video generation, the Z123 technique and many other features not available in the original Web UI.

GitHub - lllyasviel/stable-diffusion-webui-forge

Contribute to lllyasviel/stable-diffusion-webui-forge development by creating an account on GitHub.

github.com

In general, Forge is the choice of those who want to squeeze all the juices out of their hardware as much as possible. Thanks to optimizations, you will be able to work with high resolution and large ^batches8 on modest graphics cards. And integration with Unet Patcher provides access to the most advanced AI algorithms in one convenient solution.

Results

To summarize our review, we can conclude that the modern market offers a wide range of tools for working with Stable Diffusion on a local computer. Each of the GUI solutions we reviewed – InvokeAI, Forge and Fooocus – has its own unique features and advantages.

Fooocus is probably the easiest and most intuitive option for beginners. Here, no settings are actually required from the user – just enter a text prompt and the system will generate high-quality images by itself. However, this simplicity hides powerful query processing algorithms.

InvokeAI occupies a more serious niche. This tool is aimed at experienced artists and professionals seeking the most flexible control over the generation process. InvokeAI offers a universal canvas to work with, advanced customizations and support for a wide range of models and extensions. Of course, such functionality requires certain skills from the user.

But the most interesting solution is Forge. This tool combines performance optimization with support of the most modern technologies for working with Stable Diffusion. Forge allows you to speed up generation and increase resolution even on “weak” video cards. And thanks to the unique Unet Patcher mechanism, developers can easily incorporate the latest quality improvement algorithms into Forge.

Overall, all three of the solutions we reviewed give users easy and convenient access to the incredible power of Stable Diffusion. The key is to determine which aspects are important to you personally and choose the appropriate tool. Regardless of your preferences, each of them is capable of becoming a great assistant in the creative process of creating visual masterpieces.

Mini Dictionary:

Embedding – A vector representation of text data used for processing prompts by the model. ↩︎
LoRA (Low-Rank Adaptation) is a method of training and applying additions to the underlying model to improve generation quality. ↩︎
Video memory (VRAM) – The memory on a video card used to process and store data during image generation. ↩︎
FreeU is a technique for filtering images in Fourier space to improve the quality of details. ↩︎
HyperTile is a method of splitting an image into blocks to generate ultra-high resolutions. ↩︎
ControlNet is a technique for controlling the generation process with additional image-instructions. ↩︎
A sampler is an algorithm for randomly selecting values to generate noise when creating an image. Different samplers affect the visual characteristics of the result. ↩︎
Batch – a group of images that are generated simultaneously during one Stable Diffusion pass. Increasing the size of a batch allows you to create more images at a time. ↩︎