ARTICLE AD
Back in February, Google paused its AI-powered chatbot Gemini’s ability to generate images of people after users complained of historical inaccuracies. Told to depict “a Roman legion,” for example, Gemini would show an anachronistic group of racially diverse soldiers while rendering “Zulu warriors” as stereotypically Black.
Google CEO Sundar Pichai apologized, and Demis Hassabis, the co-founder of Google’s AI research division DeepMind, said that a fix should arrive “in very short order” — within the next couple of weeks. It ended up taking much, much longer than that (despite some Googlers pulling 120-hour workweeks!). But in the coming days, Gemini will once again be able to create pics showing people.
Well… sort of.
Only certain users — specifically those signed up for one of Google’s paid Gemini plans, Gemini Advanced, Business or Enterprise — will regain Gemini’s people-generating feature as part of an early access, English-language-only test. Google wouldn’t say when the test will expand to the free Gemini tier and other languages.
“Gemini Advanced gives our users priority access to our latest features,” a Google spokesperson told TechCrunch. “This helps us gather valuable feedback while delivering a highly-anticipated feature first to our premium subscribers.”
So what fixes did Google implement for people generation? According to the company, Imagen 3, the latest image-generating model built into Gemini, contains mitigations to make the people images Gemini produces more “fair.” For example, Imagen 3 was trained on AI-generated captions designed to “improve the variety and diversity of concepts associated with images in [its] training data,” according to a technical paper shared with TechCrunch. And the model’s training data was filtered for “safety,” plus “review[ed] … with consideration to fairness issues,” claims Google.
We asked for more details about Imagen 3’s training data, but the spokesperson would only say that the model was trained on “a large data set comprising images, text and associated annotations.”
“We’ve significantly reduced the potential for undesirable responses through extensive internal and external red-teaming testing, collaborating with independent experts to ensure ongoing improvement,” the spokesperson continued. “Our focus has been on rigorously testing people generation before turning it back on.”
Imagen 3 and Gems
In a spot of better news, all Gemini users will get Imagen 3 within the week — minus people generation for those who aren’t subscribed to Gemini Advanced.
Google says that Imagen 3 can more accurately understand the text prompts that it translates into images versus its predecessor, Imagen 2, and is more “creative and detailed” in its generations. In addition, the model produces fewer artifacts and errors, Google claims, and is the best Imagen model yet for rendering text.
To allay concerns around the potential to create deepfakes, Imagen 3 will use SynthID, an approach developed by DeepMind to apply invisible, cryptographic watermarks to media — unlike the outputs from Google’s Pixel Studio.
Alongside Imagen 3, Google’s rolling out Gems — albeit only for Gemini Advanced, Business and Enterprise users. Like OpenAI’s GPTs, Gems are custom versions of Gemini that can act as “experts” on topics. To create one, write instructions for a Gem, give it a name and you’re off to the races.
Here’s how Google describes them in a blog post:
“With Gems, you can create a team of experts to help you think through a challenging project, brainstorm ideas for an upcoming event, or write the perfect caption for a social media post. Your Gem can also remember a detailed set of instructions to help you save time on tedious, repetitive or difficult tasks.”