Lately I’ve been playing with Disco Diffusion, a tool that allows you to generate images based on textual input. It’s a really cool tool that can create some pretty interesting results.
To get said cool results, you need good inputs and to tweak the settings a bit. The text inputs are called “prompts” and they can be anything from a single word to a full sentence. The settings allow you to control how the neural network generates the image, and there are a lot of options to play with.
If you’d like a well written tutorial to get started with Disco Diffusion, this is the first video I’ve seen on it, which is very useful to give you a feel of it Creating Videos with Diffusion Models (Disco Diffusion 4.1). Also this is an excellent beginner-friendly tutorial to understand it at a deeper level Zippy’s Disco Diffusion Cheatsheet v0.3 – don’t be intimidated by the article length, it’s a very pleasant read and not dry at all.
Anyway, as mentioned before, to get cool results you need to experiment with different settings and prompts, and typical images take 15-20 minutes to generate (at least with my settings).
After a while, due to work, it can get a bit exhausting because you want to keep on generating imgaes, but have to switch context from what you’re working on.
In this article I’ll describe a few ways I use GPT-3 to help me easily generate prompts for Disco Diffusion and similar software, to more easily experiment.
I don’t have an optimal solution for this, but I can just share the prompts for GPT-3 that I’m using. I didn’t try to perfect these prompts too much, and mostly use them for inspiration. Sometimes GPT-3 generates some interesting prompts that I wouldn’t have thought about and I may just use just part of them. I post some results on Twitter @createdbyAI and IG @createdbyAI.art.
Table of Contents
GPT-3 Prompt Insert Mode Examples
Q: What is CLIP+Diffusion AI Art Generation?
A: CLIP+Diffusion AI Art Generation is a method of creating art using artificial intelligence. This method involves using a neural network to generate images, and then using a diffusion process to create variations on those images.
Q: How should you write the text prompt for a art generation software to get ideal output for character art?
A: You should write a descriptive prompt that the computer can “understand”. A good way to look at it is like Alt (Alternative) text in HTML. It’s quite literal, descriptive, and to the point, so the machine doesn’t have to do so much guess work.
Q: Write a text prompt for a AI art generation software that would fit the art style of Kilian Eng
A: A scene of a
[insert] by Kilian Eng
Although Complete Mode is the default GPT-3 mode, and it feels like it should be the first example, this is the main prompt I use because I feel it gives me a little bit more control to influence the resulting prompt.
My thought process here was that I wanted GPT-3 to have some context into what I want from a prompt for text-to-image generation.
I asked the questions, and let it answer it, since it’s more knowledgeable and articulate than I am, then tweaked the answers a little, like when I would with a human being somewhat.
I did this in Complete Mode available in GPT-3. Then I switched to Insert Mode.
Finally I asked it to generate a prompt in the style of a well known artist. This is because the artist in the prompt can heavily influence the result.
Other settings for GPT-3 are
Temperature: 1 and newline (
↵) as a stop sequence, so it doesn’t go off to write excessively.
I’m using Insert Mode because I want it to generate scenarios based on the art style of well known artists in this case. This is because I’m hoping to give Disco Diffusion prompts with scenarios that play to the artist’s strength. This is not necessarily a great idea, it’s just something I’m trying out. Also, GPT-3 doesn’t always give out scenarios fitting the artist’s style.
Here are some results:
A: A scene of a cyberpunk city, with bright lights and sharp geometric shapes. Think Blade Runner meets Tron. by Kilian Eng
A: When the last city fell, only the temple remained. A post-apocalyptic world by Simon Stålenhag.
A: A scene of a dark, dank swamp, with twisted trees and eerie glowing eyes staring from the shadows. by John Blanche
A: A house by a lake at sunset, with light shining through the windows and smoke coming from the chimney. by Thomas Kinkade
While generating images for this article, I accidentally gave Disco Diffusion the question as the prompt, instead of the answer (Write a text prompt for a AI art generation software that would fit the art style of Thomas Kinkade).
This is the output image for the text I initially wanted to use. It’s a bit underwhelming and the other one is much better. It seems Disco Diffusion had a slightly different interpretation than we expected this time.
A: A terrifyingly huge dragon with multiple horns, spikes running down its back, and a long tail sweeps across the sky, blotting out the sun. Underneath, a lone figure stands defiantly, weapon drawn, preparing to face the creature. by Michael Whelan
A: A magical kingdom where the sun never sets and the moon is always full. The trees are alive and the air is thick with magic. by Tyler Edlin
GPT-3 Prompt Complete Mode Examples
If you’ve ever used GPT-3 you probably guessed how to use this mode. An important setting for me is to have a stop sequence, which is mostly a newline
GPT-3 Question/Answer Prompt
In the following example I’m using a similar type of prompt as with the example above. I tried to guide GPT-3 to the type of prompt I’m looking for, and then let it do its thing.
The result is quite underwhelming, but it happens.
Sometimes I guide it a bit more in the direction I’d like. In this example I gave it
A cyberpunk and when it finished too soon I gave it
The city is and let it complete the rest.
GPT-3 “Previous Successful Prompts” Prompt
Another way is to give it a few examples of previously successful prompts, in hopes that it will generate something in the same style, and you can guide it in the direction you want.
Sometimes I guide it a little by starting with something like just
Here’s the same thing but I added artists as modifiers, scaled the image to 1024×1024 and increased steps to 500.
The prompt used is “A forgotten ruin in an overgrown jungle, inspired by H.P. Lovecraft’s At the Mountains of Madness by Li Shuxing and Tyler Edlin“
Mentioning well known artists can dramatically improve your output image. Increasing steps often makes images more detailed.
Disco Diffusion Parameters Used in This Article
In case someone is using Disco Diffusion and isn’t getting similar results as me in some cases. I’m using:
eta: 0.9 steps: 250 clip_guidance: 15000 sat_scale: 2000 range_scale: 100 cut_ic_pow: 10 Models: ViTB32, ViTB16, RN101, RN50x4
Some of the results presented above are underwhelming. Others I’ve had in the past have been great. The main reason I use GPT-3 is for parts of the prompts in generates, because sometimes it generates things I wouldn’t have thought about.
That’s about it. Hope this comes in useful to someone to use GPT-3 to generate text prompts for generating images with machine learning software. Feel free to leave a commend if you have any feedback/questions/issues, and I’ll get back to you as soon as possible.