
Again in November, I examined the picture technology capabilities inside Google’s Gemini, which was powered by the Imagen 3 mannequin. Whereas I favored it, I bumped into its limitations fairly shortly. Google not too long ago rolled out its successor — Imagen 4 — and I’ve been placing it via its paces during the last couple of weeks.
I believe the brand new model is certainly an enchancment, as a number of the points I had with Imagen 3 at the moment are fortunately gone. However some frustrations nonetheless stay, that means the brand new model isn’t fairly pretty much as good as I’d like.
How usually do you create photos with AI?
64 votes
So, what has improved?

The standard of the pictures produced has usually improved, although the advance isn’t huge. Imagen 3 was already usually good at creating photos of individuals, animals, and surroundings, however the brand new model persistently produces sharper, extra detailed photos.
In relation to producing photos of individuals — which is simply attainable with Gemini Superior — I had persistent points with Imagen 3 the place it could create cartoonish-looking images, even once I wasn’t asking for that particular model. Prompting it to alter the picture to one thing extra lifelike was usually a shedding battle. I haven’t skilled any of that with Imagen 4. All the pictures of individuals it generates look very skilled — maybe a bit an excessive amount of, which is one thing we’ll contact on later.
One in all my greatest frustrations with the older mannequin was the restricted management over side ratios. I usually felt caught with 1:1 sq. photos, which severely restricted their use case. I couldn’t use them for on-line publications, and printing them for the standard picture body was out of the query.
Whereas Imagen 4 nonetheless defaults to a 1:1 ratio, I can now merely immediate it to make use of a unique one, like 16:9, 9:16, or 4:3. That is the characteristic I’ve been ready for, because it makes the pictures created much more versatile and usable.
Imagen 4 additionally works much more easily. Whereas I haven’t discovered it to be noticeably sooner — though a sooner mannequin is reportedly within the works — there are far fewer errors. With the earlier model, Gemini would generally present an error message, saying it couldn’t produce a picture for an unknown cause. I’ve obtained none of these with Imagen 4. It simply works.
Nonetheless seems to be a bit too retouched
Whereas Imagen 4 produces higher photos, is extra dependable, and permits for various side ratios, a number of the points I encountered when testing its predecessor are nonetheless current.
My foremost downside is that the pictures usually aren’t as lifelike as I’d like, particularly when creating close-ups of individuals and animals. Photos have a tendency to come back out fairly saturated, and plenty of characteristic a distinguished bokeh impact that professionally blurs the background. All of them seem like they have been taken by a photographer with 15 years of expertise as an alternative of by me, simply pointing a digital camera at my cat and urgent the shutter.
Certain, they appear good, however a “informal mode” can be a unbelievable addition — one thing extra lifelike, the place the lighting isn’t excellent and the topic isn’t posing like a mannequin. I prompted Gemini to make a picture extra lifelike by eradicating the bokeh impact and usually making it much less excellent. The AI did strive, however after prompting it three or 4 occasions on the identical picture, it appeared to achieve its restrict and mentioned it couldn’t do any higher. Every new picture it produced was a bit extra informal, but it surely was nonetheless fairly polished, clearly hinting that it was AI-generated.
You’ll be able to see that within the photos above, going from left to proper. The primary one features a robust bokeh impact, and the person has very clear pores and skin, whereas the opposite two progress to the person trying older and older, in addition to extra drained. He even began balding a bit within the final picture. It’s not what I actually meant when prompting Gemini to make the picture extra lifelike, though it does come out extra informal.
Imagen 4 does a significantly better job with random photos like landscapes and metropolis skylines. These photos, taken from afar, don’t embody as many close-up particulars, so they appear extra real. Nonetheless, it may be a hit and miss. A picture of the Sydney Opera Home seems to be nice, though the saturation is bumped up fairly a bit — the grass is additional inexperienced, and the water is a picture-perfect blue. However once I requested for an image of the Grand Canyon, it got here out trying utterly synthetic and wouldn’t idiot anybody into considering it was an actual picture. It did carry out higher after just a few retries, although.
Enhancing is best, however not fairly there
One in all my gripes with the earlier model was its clumsy enhancing. When requested to alter one thing minor — like the colour of a hat — the AI would do it, however it could additionally generate a model new, utterly totally different picture. The best state of affairs can be to create a picture after which be allowed to edit each element exactly, similar to altering a chunk of clothes, including a selected merchandise, or altering the climate situations whereas leaving every part else precisely as is.
Imagen 4 is best on this regard, however not by a lot. After I prompted it to alter the colour of a jacket to blue, it created a brand new picture. Nevertheless, by particularly asking it to maintain all different particulars the identical, it managed to keep up plenty of the surroundings and topic from the unique. That’s what occurred within the examples above. The girl within the third picture was the identical, and she or he gave the impression to be in the same room, however her pose and the digital camera angle have been totally different, making it extra of a re-shoot than an edit.
Right here’s one other instance of a cat consuming a popsicle. I prompted Gemini to alter the colour of the popsicle, and it did, and it saved plenty of the small print. The cat’s the identical, and so is many of the background. However the cat’s ears at the moment are protruding, and the hat is a bit totally different. Nonetheless, an excellent strive.
Regardless of its shortcomings, Imagen 4 is a good software
Even with its points and a protracted wishlist of lacking performance, Imagen 4 continues to be among the many finest AI picture mills out there. Many of the issues I’ve talked about are additionally current in different AI image-generation software program, so it’s not as if Gemini is behind the competitors. It appears there are important technical hurdles that have to be overcome earlier than a majority of these instruments can attain the subsequent degree of precision and realism.
Different limitations are nonetheless in place, similar to the lack to create photos of well-known folks or generate content material that violates Google’s security pointers. Whether or not that’s an excellent or a nasty factor is a matter of opinion. For customers looking for fewer restrictions, there are options like Grok.
Have you ever tried out the most recent picture technology in Gemini? Let me know your ideas within the feedback.