Join leaders in Boston on March 27 for an exclusive night of networking, insights, and conversation. Request an invite here.
The popular AI image generating service Midjourney has deployed one of its most oft-requested features: the ability to recreate characters consistently across new images.
This has been a major hurdle for AI image generators to-date, by their very nature.
That’s because most AI image generators rely on “diffusion models,” tools similar to or based on Stability AI’s Stable Diffusion open-source image generation algorithm, which work roughly by taking text inputted by a user and trying to piece together an image pixel-by-pixel that matches that description, as learned from similar imagery and text tags in their massive (and controversial) training data set of millions of human created images.
Why consistent characters are so powerful — and elusive — for generative AI imagery
Yet, as is the case with text-based large language models (LLMs) such as OpenAI’s ChatGPT or Cohere’s new Command-R, the problem with all generative AI applications is in their inconsistency of responses: the AI generates something new for every single prompt entered into it, even if the prompt is repeated or some of the same key words are used.
VB Event
The AI Impact Tour – Boston
Request an invite
This is great for generating whole new pieces of content — in the case of Midjourney, images. But what if you’re storyboarding a film, a novel, a graphic novel or comic book, or some other visual medium where you want the same character or characters to move through it and appear in different scenes, settings, with different facial expressions and props?
This exact scenario, which is typically necessary for narrative continuity, has been very difficult to achieve with generative AI — so far. But Midjourney is now taking a crack at it, introducing a new tag, “–cref” (short for “character reference”) that users can add to the end of their text prompts in the Midjourney Discord and will try to match the character’s facial features, body type, and even clothing from a URL that the user pastes in following said tag.
As the feature progresses and is refined, it could take Midjourney further from being a cool toy or ideation source into more of a professional tool.
How to use the new Midjourney consistent character feature
The tag works best with previously generated Midjourney images. So, for example, the workflow for a user would be to first generate or retrieve the URL of a previously generated character.
Let’s start from scratch and say we are generating a new character with this prompt: “a muscular bald man with a bead and eye patch.”
We’ll upscale the image that we like best, then control-click it in the Midjourney Discord server to find the “copy link” option.
Then, we can type a new prompt in “wearing a white tuxedo standing in a villa –cref [URL]” and paste in the URL of the image we just generated, and Midjourney will attempt to generate that same character from before in our newly typed setting.
As you’ll see, the results are far from exact to the original character (or even our original prompt), but definitely encouraging.
In addition, the user can control to some extent the “weight” of how closely the new image reproduces the original character by applying the tag “–cw” followed by a number 1 through 100 to the end of their new prompt (after the “–cref [URL]” string, so like this: “–cref [URL] –cw 100.” The lower the “cw” number, the more variance the resulting image will have. The higher the “cw” number, the more closely the resulting new image will follow the original reference.
As you can see in our example, inputting a very low “cw 8” actually returns what we wanted: the white tuxedo. Though now it has removed our character’s distinctive eyepatch.
Oh well, nothing a little “vary region” can’t fix — right?
Ok, so the eyepatch is on the wrong eye…but we’re getting there!
You can also combine multiple characters into one using two “–cref” tags side by side with their respective URLs.
The feature just went live earlier this evening, but already artists and creators are testing it now. Try it for yourself if you have Midjourney. And read founder David Holz’s full note about it below:
Hey @everyone @here we’re testing a new “Character Reference” feature today This is similar to the “Style Reference” feature, except instead of matching a reference style it tries to make the character match a “Character Reference” image.
How it works
- Type
--cref URL
after your prompt with a URL to an image of a character - You can use
--cw
to modify reference ‘strength’ from 100 to 0 - strength 100 (
--cw 100
) is default and uses the face, hair, and clothes - At strength 0 (
--cw 0
) it’ll just focus on face (good for changing outfits / hair etc)
What it’s meant for
- This feature works best when using characters made from Midjourney images. It’s not designed for real people / photos (and will likely distort them as regular image prompts do)
- Cref works similarly to regular image prompts except it ‘focuses’ on the character traits
- The precision of this technique is limited, it won’t copy exact dimples / freckles / or tshirt logos.
- Cref works for both Niji and normal MJ models and also can be combined with
--sref
Advanced Features
- You can use more than one URL to blend the information /characters from multiple images like this
--cref URL1 URL2
(this is similar to multiple image or style prompts)
How does it work on the web alpha?
- Drag or paste an image into the imagine bar, it now has three icons. selecting these sets whether it is an image prompt, a style reference, or a character reference. Shift+select an option to use an image for multiple categories
Remember, while MJ V6 is in alpha this and other features may change suddenly, but V6 official beta is coming soon. We’d love everyone’s thoughts in ideas-and-features We hope you enjoy this early release and hope it helps you play with building stories and worlds
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.