Home » AI » Gemini 2.0 Flash’s New Multimodal Image Gen Is My New Image Editor

Gemini 2.0 Flash’s New Multimodal Image Gen Is My New Image Editor

by Ravi Teja KNTS
0 comment
  • Google just released native image generation and editing with the Gemini 2.0 Flash Experimental model.
  • It is available in AI Studio for free right now.
  • Gemini uses native multimodal capability to edit images. You can add and remove text, and objects, change camera angles, color, and more using simple text-based prompts.

Google has released an impressive update with Gemini 2.0 Flash Experimental. You can not only generate images now but also edit them consistently without losing changes using simple text-based prompts.

There are plenty of AI image editors out there with the likes of Dall-E 3 and Imagen 3 fighting for your time and money. While they are good at generating images, editing with them was sadly out of reach. These AI models were trained to generate images only. Instead of making changes, they usually ended up creating new ones from scratch.

Gemini is currently the only multimodal AI chatbot that can handle both text and images natively. Meaning, that when you ask Gemini to edit a generated image, it does so natively instead of routing the request to a specialized image diffusion model like Imagen 3.

Gemini’s multimodal capabilities help it understand both text and images natively helping it achieve some impressive feats. Let’s break it down with some examples.

What’s New With Gemini 2.0 Flash Native Image Generation and Editing

Until now, when you asked an AI model to edit an image, instead of editing the generated image, it would regenerate a new image entirely creating two distinct images instead.

For example, here’s ChatGPT’s response when I ask it to change the car’s color from black to red. Instead of changing the color, it generates a new red car instead with new road, different background, and even different car model.

Now when I ask Gemini to change the car’s color from black to red, it keeps the image consistent and only makes the required change. It only changes the color but keeps the car model, road, and background, all consistent.

Gemini uses native multimodal capability to keep images consistent even when generating step-by-step instructions. For example, when you ask for a pasta recipe, Gemini will generate images for each cooking step, keeping the details like the bowl or pan consistent. You can even download these images for personal use.

This is still a beta feature and is currently not available directly inside Gemini. However, everyone can access it for free inside the AI Studio app, Google’s AI beta testing app. Just hop onto Google’s AI Studio website, select the Gemini 2.0 Flash Experimental model, and test it.

Examples of Gemini 2.0 Flash Image Generation

We tested the feature in several different ways and every time, it came out on top delivering consistent results.

First, I asked the model to generate an image of vanilla ice cream. Later, I asked it to add chocolate syrup, and it did exactly that without changing anything—even the scoop was exactly the same as in the first image.

Similarly, I asked Gemini to change the camera angle and it did that perfectly. For example, I first generated an image of a classic red car. When I asked for a different camera angle, it generated an image with the front view instead of the side view.

As I ask Gemini to add more edits, the model made changes like adding/removing items, changing placements, adjusting camera angles, and more as requested.

Not just for generated images, you can also upload your own images and then edit them. In the example below, I asked the model to convert the image into a sunset with vivid colors, and it did that perfectly.

Want to make your black-and-white image colorful? You can ask Gemini to do it.

You can also try and upload an art style and ask it to generate something in that particular style, and the model can replicate it exactly.

Since Gemini is good with both text and images, you can now ask it to add text to images. Earlier, Gemini, like most AI models, struggled with adding and editing text inside an image.

Here’s Gemini generating a Happy Birthday card with a bunch of text exactly as requested.

As mentioned, Gemini uses its multimodal capability to generate consistent images in various ways. For example, here’s an entire story created by Gemini, generating images for each step of the story. Notice how the characters are consistent.

You can also request recipes with images for each step, and the model will maintain consistency throughout.

However, the model is not completely perfect. If you observe, when creating a recipe, the model first baked the cookies and then placed them on a tray. While this doesn’t commonly happen, we observed some occasional issues during our testing. Additionally, one time when I asked to change the color of the car, it changed the entire car rather than just the color. However, when I tried again, it correctly changed just the car’s color.

You may also like