🗒️

dalle-3 role consistency practice

00 min
Nov 5, 2023
Nov 7, 2023
type
status
date
slug
summary
tags
category
icon
password
😀
This article discusses the problem of character consistency and introduces a solution that can achieve better character consistency by conducting experiments and summarizing experience in dalle-3, which facilitates the creation of interesting content.

dalle-3 role consistency

Background

The topic of character consistency has been around for a long time, whether it's in "Midjourney" or "Stable Diffusion" this problem exists. It's been talked about over and over again. The current issue is finding a solution and figuring out how to avoid such problems.
Let's make our generated characters more precise and consistent so that they can be used in more scenarios, not just as one-time graphics for a single scene. If we can solve this well, it would be very convenient for creating AI picture books.
This article originated from seeing a post on Reddit that discussed how to create consistent characters with DALL-E. The author listed his experiments with DALL-E and, based on his practice and experience, discovered a very good solution to the problem of character consistency. This article aims to follow his approach, validate his conclusions, and learn how to better use DALL-E to create more fun and interesting things.

Practicing Character Consistency

For those interested, you can read the guide on how to create consistent characters with DALL-E mentioned in this article. However, this article is mainly focused on practical application.
The character described in this article is a "digital painting of a lithe courtsean woman with a soft face, a hint of elven features, mystical green eyes, long straight black hair with flared points in front, light (nearly golden) skin, with gradient shading, clean linework, vibrant palette, and stylized proportions. Wearing red robes over basic cloth garments, [scenario]."
According to the author, if we want to maintain role consistency, we need to have a precise template, which can be roughly divided into these categories.

Core character appearance

It means using a few precise words to define a character's specific characteristics such as face, hair, and body shape. For example, a lithe court lady with a soft face, a touch of elven features, mysterious green eyes, long straight black hair with a flared point in the front, and light (almost golden) skin.

Simple worn and carried items

Simple accessories and carried items can also help define a character more specifically, allowing DALL-E to better handle the appearance without adding strange things on its own. In this case, the character is described as "wearing red robes over basic cloth garments" and does not carry any items.

Image Style

There are various image styles available in dalle3, such as "3D style", "modern style", "retro style" and so on. You can freely choose from these styles. In this case, the chosen style is "digital painting". Following the author's instructions, additional style attributes can be added to better shape the character's image. The added content includes "digital painting, gradient shading, clean linework, vibrant palette, and stylized proportions."

Scene

The above content is used to provide key information for character positioning. When combined, this information can generally determine the character's image. So, what about the scene? This refers to the specific meaning of the scene we are using. For example, playing the flute on a mountain, playing the piano in a bamboo forest, washing clothes in a mountain stream, dancing and so on, are all considered specific scenes that depict the character's actions.
There are key points to consider when describing a scene: it should take place "in a specific environment," such as a mountain or a bamboo forest, as mentioned earlier. Then, the character should be engaged in some kind of action, preferably described with an active verb. Lastly, you can add "strong emotional descriptors," although it is not necessary. It depends on personal preference.

Summarizing the Descriptions

As we can see, to achieve consistency in character portrayal, we need to summarize the key points mentioned above. This way, the character's image becomes more complete, and the resulting image will have more consistency.
Let's continue using the example of a lithecourtsean woman and create a complete instance. The description is as follows: "Digital painting of a lithe courtsean woman with a soft face, a hint of elven features, mystical green eyes, long straight black hair with flared points in front, light (nearly golden) skin, with gradient shading, clean linework, vibrant palette, and stylized proportions. Wearing red robes over basic cloth garments, shooting a bow."
Please select the "GPT4" feature in the "ChatGPT" and then choose "dalle-3". Next, paste the description words mentioned above into the chat, and you can wait for "dalle-3" to generate an image for us.
Let's take a look at the results and see what kind of images "dalle-3" generates based on this description.
 
notion image
 
notion image
Looking at these two images, the character's image of the court lady is quite consistent. The appearance, clothing, hair, and ears are all quite similar. Let's continue with more scenes. The current images seem decent, but they still lack a certain something.

Testing More Scenes

In addition to the scene of shooting a bow, I have conducted experiments with four other scenes. Let's demonstrate them one by one.

Dancing Scene

Let's have a dancing scene. The description is as follows: "Digital painting of a lithe courtsean woman with a soft face, a hint of elven features, mystical green eyes, long straight black hair with flared points in front, light (nearly golden) skin, with gradient shading, clean linework, vibrant palette, and stylized proportions. She is wearing red robes over basic cloth garments and is dancing."
Now let's take a look at the generated result (of course, if you are not satisfied with the result, you can ask dalle-3 to generate multiple times until you are satisfied).
notion image
notion image
As you can see, the dancing scene is quite beautiful and looks amazing. 😂

Washing Clothes in a Mountain Stream Scene

Let's have a scene of washing clothes in a mountain stream. The description is as follows: Digital painting of a lithe courtsean woman with a soft face, a hint of elven features, mystical green eyes, long straight black hair with flared points in front, light (nearly golden) skin, with gradient shading, clean linework, vibrant palette, and stylized proportions. She is wearing red robes over basic cloth garments and is washing clothes in a mountain stream.
Now let's take a look at the generated result.
 
notion image
 
notion image
Isn't it wonderful? Haha.

Playing the Piano in the Bamboo Forest Scene

Next, let's have a scene of playing the piano in the bamboo forest. The description is as follows: Digital painting of a lithe courtsean woman with a soft face, a hint of elven features, mystical green eyes, long straight black hair with flared points in front, light (nearly golden) skin, with gradient shading, clean linework, vibrant palette, and stylized proportions. She is wearing red robes over basic cloth garments and is playing the piano in the bamboo forest.
The generated images are as follows (this scene has been generated multiple times):
notion image
notion image
These two images are beautiful and they are also set in a bamboo forest. Unfortunately, there is no scene of someone playing the piano.
notion image
 
notion image
They are slightly more enchanting, but the previous two are more elegant 😂

Scene of playing the flute on the mountaintop

Let's have one more scene, a scene of playing the flute on the mountaintop. The description is as follows: Digital painting of a lithe courtsean woman with a soft face, a hint of elven features, mystical green eyes, long straight black hair with flared points in front, light (nearly golden) skin, with gradient shading, clean linework, vibrant palette, and stylized proportions. Wearing red robes over basic cloth garments, playing the flute on the top of the mountain.
The generated images are as follows:
 
notion image
 
notion image
They look good. Let's try again and have DALL·E 3.0 generate a new one for me.
 
notion image
notion image
notion image
notion image
Overall, I think it's pretty good 😂
After practicing these five scenarios and considering the appearance of the core characters mentioned above, as well as their simple accessories and items, combining the visual style can roughly determine the consistency of the characters. This brings great convenience for us to create more AI drawing applications, which is especially good.

More character descriptions

In this Reddit article, there are other character descriptions, which are also listed here. If readers are interested, they can also use these descriptive words to experiment with DALL·E-3.

Character Description 1

Digital painting of a tall, slender ageless elf wizard (flowing hair and sharp features) with gradient shading, clean linework, vibrant palette, and stylized proportions. Wearing a white and gold robe with leaf patterns and a necklace of large mala beads, [scenario]
notion image

Character Description 2

Digital painting of a tall, slender ageless elf wizard (flowing hair and sharp features) with gradient shading, clean linework, vibrant palette, and stylized proportions. Wearing a white and gold robe with leaf patterns and a necklace of large mala beads, [scenario]
notion image

Character Description 3

Digital painting of a girly halfling with tussled, shoulder-length bright red hair and a freckled round face with gradient shading, clean linework, vibrant palette, and stylized proportions. Wearing a blue sorcerer's traveling tunic and walking staff, [scenario]
notion image

Character Description 4

Digital painting of a rugged, tattooed dwarf warrior with thick, braided mahogany beard and a chiseled square face with gradient shading, clean linework, vibrant palette, and stylized proportions. Wearing sturdy heavy armor with a heater shield and battleaxe, [scenario]
notion image

Character Description 5

Digital painting of a shifty crimson-skinned tiefling rogue with slick, coal-black hair and youthful, sharp face with curled horns with gradient shading, clean linework, vibrant palette, and stylized proportions. Wearing brown leather armor with a bandolier of vials, [scenario]
notion image
Feel free to come up with more descriptive words on your own.

References