How to Use DreamBooth to Fine-Tune Stable Diffusion (Colab)

byEdXD
October 5, 2022
133 comments
15 minute read

Screenshot of generating using fine-tune Stable Diffusion — Prompt: borderlands style portrait of Sandman2022, intricate, highly detailed, digital painting, arstation, concept art, smooth, sharp focus, illustration

With Stable Diffusion DreamBooth, you can now create AI art generation images using your own trained images.

For example, you can generate images with yourself or a loved one as a popular video game character, as a fantastical creature, or just about anything you can think of – you can generate a sketch or a painting of your pet as a dragon or as the Emperor of Mankind.

You can also train your own styles and aesthetics like aetherpunk/magicpunk, or maybe people’s facial expressions like Zoolander’s Magnum (I haven’t tried this yet).

In this tutorial, we’ll cover the basics of fine-tuning Stable Diffusion with DreamBooth to generate your own customized images using Google Colab for free. After we’ve tuned Stable Diffusion, we’ll also test it out using Stable Diffusion WebUI built into the same Google Colab notebook.

Stable Diffusion is one of the best AI art generators, which has a free and open-source version that we’ll be using in our tutorial.

Google Colab is a cloud service offered by Google, and it has a generous free tier. That’s what we’ll be using to fine-tune Stable Diffusion, so you don’t need a powerful GPU or any strong hardware for this tutorial.

In machine learning, fine-tuning means adjusting a model that was trained on one dataset to work with a new, related dataset. This can make your model work better on the new dataset or help it work better in a new situation. A dataset, in our case, is a bunch of pictures and some words that tell a machine what it should be looking for in order to generate new images.

Quick Video Demos
Use DreamBooth to Fine-Tune Stable Diffusion in Google Colab
Test the Trained Model (with Stable Diffusion WebUI by AUTOMATIC1111)
1. Where Generated Images Are Stored
Upload Your Trained Model to Hugging Face
FAQ
Troubleshooting
Conclusion
Very Useful Resources

Quick Video Demos

Training Multiple Subjects on Stable Diffusion 1.5

This is a quick video of me fine-tuning Stable Diffusion with DreamBooth from start to finish. In this example, I’m fine-tuning it using 10 images of the Sandman from The Sandman (TV Series) and 10 images of Aemond Targaryen from House of the Dragon.

The whole process took about 40 minutes on Google Colab with 2000 training steps because I trained 2 subjects (Aemond and Sandman) and changed some settings. If you’re training a face for the first time, then I recommend leaving default settings, which should take you about 15-20 minutes.

Watch this video on YouTube

The following are two more video demos training a single subject on a model rather than multiple subjects on the same model. This is because when training multiple similar subjects on the same model, like faces of the same gender, they might get blended together.

Training Aemond Targaryen on Stable Diffusion 1.5

Watch this video on YouTube

Training Sandman on Stable Diffusion 2.1

Watch this video on YouTube

Sidenote: AI art tools are developing so fast it’s hard to keep up.

We set up a newsletter called tl;dr AI News.

In this newsletter we distill the information that’s most valuable to you into a quick read to save you time. We cover the latest news and tutorials in the AI art world on a daily basis, so that you can stay up-to-date with the latest developments.

Check tl;dr AI News

Use DreamBooth to Fine-Tune Stable Diffusion in Google Colab

Prepare Images

Choosing Images

When choosing images, it’s recommended to keep the following in mind to get the best results:

Upload a variety of images of your subject. If you’re uploading images of a person, try something like 70% close-ups, 20% from the chest up, 10% full body, so Stable Diffusion also gets some idea of the rest of the subject and not only the face.
Try to change things up as much as possible in each picture. This means:
- Varying the body pose
- Taking pictures on different days, in different lighting conditions, and with different backgrounds
- Showing a variety of expressions and emotions
When generating new images, whatever you capture will be over-represented. For example, if you take multiple pictures with the same green field behind you, it’s likely that the generated images of you will also contain the green field, even if you want a dystopic background. This can apply to anything, like jewelry, clothes, or even people in the background. If you want to avoid seeing that element in your generated image, make sure not to repeat it in every shot. On the other hand, if you want it in the generated images, make sure it’s in your pictures more often.
It’s recommended that you provide 10 images of what you’d like to train Stable Diffusion on to get great results when training on faces.
A note on training multiple subjects:
- Training multiple subjects of the same gender on the same model is very likely to lead to blending between them. You may notice Sandman having one eye a bit different, which he “inherits” from Aemond’s eyepatch.
- To mitigate the blending of multiple subjects, the author of the notebook (TheLastBen) recommended using UNet_Learning_Rate: 2e-6 instead of the default 5e-6. However, he recommends training a subject on a separate model to get the best results.

Resize & Crop to 512 x 512px

Once you’ve chosen your images, you should prepare them.

First, we need to resize and crop our images to be 512 x 512px. We can easily do this using the website https://birme.net.

To do this, just:

Visit the website
Upload your images
Set your dimensions to 512 x 512px
Adjust the cropping area to center your subject
Click on Save as Zip to download the archive.
You can then unzip it on your computer, and we’ll use them a bit later.

Birme.net - Resize Images — Resizing Images using Birme.net

Renaming Your Images

We’ll also want to rename our images to contain the subject’s name:

Firstly, the subject name should be one unique/random/unknown keyword. This is because Stable Diffusion also has some knowledge of The Sandman from other sources other than the one played by Tom Sturridge, and we don’t want it to get confused and make a combination of interpretations of The Sandman. As such, I’ll call it Sandman2022 to make sure it’s unique.
Renaming images to subject (1), subject (2) .. subject (30). This is because, using this method, you can train multiple subjects at once. If you want to fine-tune Stable Diffusion with Sandman, your friend Kevin, and your cat, you can give it prepared images for each of them. For the Sandman, you’d have Sandman2022 (1), Sandman2022 (2) … Sandman (30), for Kevin, you’d have KevinKevinson2022 (1), KevinKevinson2022 (2) … KevinKevinson (30), and for your cat you’d have DexterTheCat (1), DexterTheCat (2) … DexterTheCat(30).

Here’s me renaming my images for Sandman2022 in bulk on Windows. Just select them all, right-click one of them and click Rename and give it what name you want, and click anywhere to finish the renaming. Everything else will be renamed as well.

Bulk Rename Windows - Sandman2022 — Bulk Renaming Images on Windows

When it’s time to upload my images to DreamBooth, I’ll want to train it for Sandman2022 and AemondHoD, and this is how my images will look like:

Open Fast Stable Diffusion DreamBooth Notebook in Google Colab

Next, we’ll open the Fast Stable Diffusion DreamBooth Colab notebook: https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb

The notebook we’re using is an implementation of DreamBooth by TheLastBen. The author optimized multiple implementations of Stable Diffusion. You can check the GitHub repository here.

Enable GPU

Before running the notebook, we’ll first have to make sure Google Colab is using a GPU. This is because GPUs can process much more data than CPUs and allows you to train our machine learning models faster.

To do this:

In the menu, go to Runtime > Change runtime type.
Runtime > Change runtime type
A small popup will appear. Under the Hardware accelerator, make sure you have selected GPU. Click Save when you’re done.
Hardware Accelerator > GPU

Run First Cell to Connect Google Drive

By running the first cell, we’ll start connecting our notebook to Google Drive so we can save all of our files in it – this includes the Stable Diffusion DreamBooth files, our fine-tuned models, and our generated images.

After running the first cell, we’ll see a popup asking us if we really want to Connect to Google Drive.

After we click it, we’ll see another popup where we can select the account we want to connect with and then allow Google Colab some permissions to our Google Drive.

Run the Second Cell to Install Dependencies

Next, just run the second cell. There’s nothing for us to do there except wait for it to finish.

Run the Third Cell to Download Stable Diffusion

Next, we’ll want to download our Stable Diffusion model. This is the base model that we’ll train. You’ll notice that there are 3 default models available when clicking the Model_Version dropdown:

1.5: This is still the most popular model. It knows many artists and styles and is the easiest to play with. Most custom models you’ll find are still trained on 1.5 at the time of writing.
V2.1-512px: V2.1 is the latest base model available. It excels at photorealism, but it’s more difficult to use, and it’s had a lot of artists and NSFW content taken out. If you know how to use it, you can get some excellent results. 512px means that it can generate 512px images, which you can then upscale to a higher resolution. I recommend checking out this Reddit thread discussing 2.1 vs 1.5 to get a better idea of the differences between them.
V2.1-768px: This is similar to the V2.1-512px, however, it can generate 768px images. This means that they’ll have more detail than 512px ones. This will require more RAM, however. (This model hasn’t worked for me on Google Colab free)

If you just want the base model, just select one of the versions from the dropdown and run the cell.

If you want a custom model, then you’ll need to choose the custom model version and provide the path to it.

Custom Models:

- Custom_Model_Version: Custom models are also trained from base models. For the notebook to know the algorithm to use when training the model later on, it will need the model version it’s trained on (1.5, V2.1-512px, V2.1-768px). Make sure to get this right.
- Path_to_HuggingFace: If you want to load and train over a different model from Hugging Face than the default one, you can provide the path to it. For example, if you want to train Stable Diffusion to generate pictures of your face but in Elden Ring style, you could get this already fine-tuned model https://huggingface.co/nitrosocke/elden-ring-diffusion. The path you should provide is what comes after huggingface.co. In our case, that’s nitrosocke/elden-ring-diffusion
- CKPT_Path or CKPT_Link: If you already have an existing Stable Diffusion model that you’d like to fine-tune, you can provide the path to it in CKPT_Path instead of the HuggingFace Token. Alternatively, if you have a Stable Diffusion model, whether it’s a link to any online .ckpt file or if it’s a shareable Google Drive Link, you can input it in the CKPT_Link field.

Setting Up Dreambooth

We can now get to setting up DreamBooth.

Here, we’ll input our Session_Name. This will be the name of the trained model that we’ll save. This is where you’ll input previous sessions to load them, should you want to fine-tune them further. It can be anything you want.

Important: Don’t use spaces in the session name. Instead, use _ or -.

Run the cell after you input the session name.

Notes:

Session_Name: This will be the name of your session and of your final model. You can name it anything. If you provide a name that doesn’t exist it will create a new session and if you use a name of a session that already exists in your Google Drive in My Drive > Fast-DreamBooth > Sessions then it will ask you whether you want to overwrite it or resume training it.
Session_Link_optional: Instead of providing the Session_Name you can provide the path to the session. For example, the path to mine will be /content/gdrive/MyDrive/Fast-Dreambooth/Sessions/Aemond_Sandman.
Model_Version: If you’re loading a previous session with a trained model, then select the Stable Diffusion version of that model.

Upload Your Instance Images

Next, you’ll see the Instance Images cell. This is where we upload our images.

If you run it Choose Files button will appear, allowing you to upload images.

Additional Options:

Remove_existing_instance_images: If you already uploaded some images but want to remove them to run the cell and upload other images again, then that’s what Remove_existing_instance_images is for. If you want to keep the previously uploaded images, then uncheck that box.
IMAGES_FOLDER_OPTIONAL: If you have a folder on your Google Drive that already contains your images, then just provide the path to it and then run the cell, instead of uploading the images from your computer.
Crop_images: Check this if you haven’t already cropped them yourself. They’ll be cropped in squares, and you can set the crop size yourself. It’s left to 512 by default because 512 x 512 px is the usual image dimension used.

Start DreamBooth

Finally, we can run DreamBooth. We have a few configurations here.

The Training_Steps are what we care most about.

Training_Steps: The most important thing we can do here is set the training steps. We’ll want to set the total number of images we’ve uploaded multiplied by 100. I uploaded 10 images of Aemond and 10 images of the Sandman, so that’s a total of 20 * 100 = 2000. So that’s 2000 steps. If the model isn’t good enough, you can pick up where you left off and further train it.

You most likely won’t have to touch these options if this is your first time, but we’ll still explain them just in case:

Resume_Training: You’ll check this box if you want to continue training the model after you’ve tested it.
UNet_Training_Steps: These are the training steps (also referred to as iterations or epochs) that the model undergoes during the training process. Each step involves passing images through the model, computing the loss, and updating the model’s weights to improve its performance.
- UNet is a popular neural network architecture that Stable Diffusion uses to identify the different components within an image. A neural network architecture is like the blueprint that tells the computer how to connect all its different parts so it can learn how to draw the lines correctly.
UNet_Learning_Rate: The learning rate is how quickly a model learns. More on learning rate below.
Text_Encoder_Training_Steps: The training steps for the text encoder. More on what the text encoder is.
Text_Encoder_Concept_Training_Steps: This is if you’re using concept images. Read about what concept images are below.
Text_Encoder_Learning_Rate: The learning rate for the text encoder. More on the learning rate below.
External_Captions: To use external captions for images just write a .txt file with the same name as the corresponding image, and upload the .txt files along with the images. Read here about captions here.
Style_Training: This is for when you’re training styles. I’m not sure what different it makes yet. Will update when I do.
Resolution: The resolution of the images you’re training the model on. I have only tested with 512px.
Save_Checkpoint_Every_n_Steps: This option enables you to save the fine-tuned model at different points during the training. This is useful if you suspect that the number of steps you’ve set is too many, and it will overtrain Stable Diffusion. Overtraining is where the model will end up generating almost exactly the pictures you gave it, which means it won’t generate original work.
- Save_Checkpoint_Every: the model will save the model each X steps you set. If you set it to 500 it will save it at 500, then 1000, then 1500 and so on.
- Start_saving_from_step: this is the minimum amount of steps from which DreamBooth will start to save the model.
Disconnect_after_training: You may want to leave it to train the model and forget about it. Google Colab free might not let you use a GPU for the rest of the day if you use it for too long, and Colab Pro uses up units. If you know you won’t be around when it’s done training, then best check this box.

Finally, you can run the cell when you’re done with the options.

This can take a while. In my case, 2000 steps took about 35 minutes with Google Colab free, using an Nvidia Tesla T4 GPU and decreased learning rate. If you’re training only one subject with default settings, then it should take you about 15-20 minutes.

Started Training — Start Training Dreambooth

Where Your New Model is Stored

When it’s done, you should find your model in your Google Drive. For example, here’s where Aemond_Sandman.ckpt was saved with default output folder settings. This should be My Drive > Fast-DreamBooth > Sessions > Your_Session_Name.

Test the Trained Model (with Stable Diffusion WebUI by AUTOMATIC1111)

After the training cell has finished running, we can test our new fine-tuned Stable Diffusion model.

This notebook comes with Stable Diffusion WebUI by AUTOMATIC1111, which is the most popular implementation of Stable Diffusion, and offers us a very convenient web user interface.

We have a few options:

If you have just fine-tuned Stable Diffusion for the first time (this is us, most likely) and want to test your newly created model, then just run the Test the trained model cell. No need to fill out anything.
Previous_Session: If you have previously fine-tuned a different Stable Diffusion model that you want to test, then insert the Previous_Session. Since I’m assuming this is our first time, you can leave this empty.
Use_Custom_Path: If you have a model you want to load that’s in some folder in your Google Drive, then check Use_Custom_Path , and after you run the cell, you’ll see a field to provide the path to your model.
You can leave Use_localtunnel unchecked. This is how our link will be generated for us to access the Stable Diffusion WebUI. When unchecked, it uses the servers of Gradio.app to generate a URL, as you see in the screenshot above. If it’s checked, then it uses a service called localtunnel. We have both these options available in case one of them doesn’t work.

When you run the cell, it will take about 5 minutes for Stable Diffusion WebUI to be ready to use. When it’s done, you’ll see some URLs like https://31767n39-6e5b-46fe.gradio.live when Use_localtunnel is left unchecked, and https://fancy-spies-punch-34-150-175-108.loca.lt when leaving it’s checked.

Click it, and then you can start using Stable Diffusion to generate our images. The user interface will open in a new tab, and you can start generating images right away.

Where Generated Images Are Stored

Images are stored by default in your Google Drive in My Drive > sd > stable-diffusion > outputs > txt2img-images.

Upload Your Trained Model to Hugging Face

You can also upload your trained model to Hugging Face, to the public library or just have it privately in your account. You can change it to public later on.

To do this, you’ll have to use a Hugging Face token that has WRITE role. Simply create a token and set the role to WRITE.

In the Upload The Trained Model to Hugging Face cell, you’ll have the following:

name_of_your_concept: This will be the name of the model you upload, and it will show up in your URL as well. For example, https://huggingface.co/ByteXD/aemondhod-sandman2022.
Save_concept_to: You can select between your profile (by default, it will be set to private) or the public library.
hf_token_write: This is your Hugging Face token. Make sure it has its’ role set to WRITE.

Run the cell when you’re done. It will take ~10-15 minutes to finish uploading. When it’s done, it will output a link to where you can access your newly uploaded model.

Upload Model to Hugging Face 2 — Finished Uploading the Model to Hugging Face

This is what it looks like: https://huggingface.co/ByteXD/aemondhod-sandman2022-feb2023. To make it publicly accessible, go in the model’s settings section and click on Make this model public.

Aemond-Sandman-Uploaded-HuggingFace — Aemond Sandman Model Page

FAQ

What is a .ckpt file?

The .ckpt file extension is commonly used for checkpoint files, and the file is also referred to as the weights of the model. Although not exactly accurate, we can think of it as the model file. Checkpoint files are used to save the state of a program or process at a particular point in time.

What is the learning rate, and how does it affect generated images?

The “learning rate” is a tuning parameter that determines how quickly a model learns from the data during training.

Imagine you are trying to learn a new skill, such as playing the piano. If you practice too slowly, you may not make much progress, and it may take a long time to learn. On the other hand, if you practice too quickly, you may make mistakes and not really learn the skill well. The learning rate in machine learning is similar – it controls how quickly the model learns from the data.

A high learning rate means that the model will learn more quickly from the data, but it may also make more mistakes and not learn the best representation of the data. Conversely, a low learning rate means that the model will learn more slowly but may be more accurate in the long run.

A lower learning rate is not always better. In some cases, it may be better to have a higher learning rate and more steps, for example.

A learning rate that’s too low, along with too many training steps, may lead to overfitting, and you may not generate creative images and generate images too similar to those that you trained Stable Diffusion on. And a learning rate that’s too high might lead to images that are not as accurate or realistic.

To find the best learning rate for UNet and text encoder, I recommend checking what others are saying from their experiments or experimenting yourself.

What is the text encoder in DreamBooth?

The text encoder translates (encodes) the prompt into a language the model understands, and fine-tuning it on specific images will make the model converge faster to their likeness. (quoting the TheLastBen)

What are concept images in DreamBooth?

Prior preservation loss proved as a weak method for regularizing stability’s model, I implemented concept images to replace class images, and they act as heavy regularization. They force the text encoder to widen its range of diversity after getting narrowed by the instance images.

Example: you use 10 instance pictures of a specific car in a garage using the identifier “crrrr“, without concept images. At inference, most of the output when you use the term “crrrr” or even car will be a car inside a garage, because the terms car and garage are overwritten by the embeddings from the instance images when training the text encoder.

Now you start over with the same instance images, but this time you add 200 concept images of different cars in different places, you set the text_encoder_concept_steps to 1500 steps, and now the text encoder receives more information about the token “car” and able to put it in different situations. At inference you will get relatively diverse situations for the car whilst keeping the same characteristics of the trained car.

TL;DR: Concept images are like class images but during training, they are treated as instance images without including the identifier and trained only to the text encoder to help with diversity and variety.

The filename doesn’t matter for concept images or the resolution. All that matters is the content of the image.

(quoting the TheLastBen)

Troubleshooting

In this section, we’ll address some common errors.

ModuleNotFoundError: No module named ‘modules.hypernetworks’

As of writing this, this error should be fixed. If you’re encountering it then we’ll want to do a clean run of the DreamBooth notebook. To do this:

Delete your sd folder in your Google Drive, located at My Drive > sd.
- Important – Back Up Previously Generated Images: If you previously generated images, back them up. They are located in My Drive > sd > stable-diffusion > outputs. They’ll be deleted if you don’t back them up.
Then make sure you’re running the latest Colab notebook (in case you were running one saved to your Google Drive) https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb.

That’s it. Now it should work. We just had to do a fresh run of the DreamBooth notebook without the previously stored files.

Next, if you were interrupted before getting to train your model, you can continue with the instructions below.

ImportError: cannot import name ‘VectorQuantizer2’

To fix this:

Run the following command in any cell:

!pip install taming-transformers-rom1504

Then restart the runtime by going in the menu to Runtime > Restart runtime.
Now, if you run everything again, it should work.

Next, if you have already fine-tuned your model, to get back to testing it quickly, follow the instructions below.

If You Just Trained a Model but Didn’t Get to Test It Because of an Error

If you got an error in the last cell Test the trained model , fixed it, and now restarted the notebook, you don’t have to go through the training again.

Just run the first two cells (Connect to Google Drive and Setting up the environment)
After that’s done, go to the Test the trained model cell and just insert your INSTANCE_NAME from earlier (Sandman2022 in my case) or Use_Custom_Path (if you have it somewhere else other than My Drive), and run the cell and it should work. When you trained the model earlier, it got saved in your Google Drive, so now the notebook will just load your fine-tuned Stable Diffusion model.

Conclusion

In this tutorial, we covered how to fine-tune Stable Diffusion using DreamBooth via Google Colab for free to generate our own unique image styles. We hope this tutorial helped you break the ice in fine-tuning Stable Diffusion.

If you encounter any issues or have any questions, please feel free to leave a comment, and we’ll get back to you as soon as possible.

Very Useful Resources

A Discord Server for Stable Diffusion DreamBooth – This is a Discord community dedicated to experimenting with DreamBooth. It can be of great help in better understanding how to use DreamBooth to get desired results, such as how to better design prompts and other troubleshooting tips, so I highly recommend it! Also, fun fact: one of the moderators is Joe Penna (Mystery Guitar Man), who many might know from his very popular YouTube channel years ago.
Lexica.art, OpenArt.ai, Krea.ai – these are search engines for Stable Diffusion prompts, along with their resulting images. They are fantastic for inspiration, and you can easily get some great results by using them.
Civitai.com – this is a hub where you can find/share Stable Diffusion models.
The Wiki for AUTOMATIC1111’s Stable Diffusion WebUI – The web interface we’re using when generating images has many features, as you might have seen. This wiki is the documentation for all those features, and it gives us a great overview of how to use the interface. I highly recommend you check it out.

How to Use DreamBooth to Fine-Tune Stable Diffusion (Colab)

Table of Contents

Quick Video Demos

Training Multiple Subjects on Stable Diffusion 1.5

Training Aemond Targaryen on Stable Diffusion 1.5

Training Sandman on Stable Diffusion 2.1

Use DreamBooth to Fine-Tune Stable Diffusion in Google Colab

Prepare Images

Choosing Images

Resize & Crop to 512 x 512px

Renaming Your Images

Open Fast Stable Diffusion DreamBooth Notebook in Google Colab

Enable GPU

Run First Cell to Connect Google Drive

Run the Second Cell to Install Dependencies

Run the Third Cell to Download Stable Diffusion

Setting Up Dreambooth

Upload Your Instance Images

Start DreamBooth

Where Your New Model is Stored

Test the Trained Model (with Stable Diffusion WebUI by AUTOMATIC1111)

Where Generated Images Are Stored

Upload Your Trained Model to Hugging Face

FAQ

What is a .ckpt file?

What is the learning rate, and how does it affect generated images?

What is the text encoder in DreamBooth?

What are concept images in DreamBooth?

Troubleshooting

ModuleNotFoundError: No module named ‘modules.hypernetworks’

ImportError: cannot import name ‘VectorQuantizer2’

If You Just Trained a Model but Didn’t Get to Test It Because of an Error

Conclusion

Very Useful Resources

Tags:

How to Reduce Video Size (Compress) With FFmpeg

How to Execute Shell Commands in Python

You May Also Like