A few months ago, Google unveiled Imagen 3, its next-generation text-to-image generator, through a beta phase in the ImageFX platform. Now, it’s available to everyone as part of Google Gemini. Google claims the new model can create highly detailed and lifelike images and follow prompts more accurately. So, we tested Imagen 3, comparing it with OpenAI’s DALL-E 3, the image-generating AI on ChatGPT.
We gave the same prompts to Imagen 3 and Dall-E 3 to test them on different metrics including their text rendering capabilities, animation styles, camera angles, and even their ability to follow prompts. Here are our comparison results highlighting which AI model performed better overall.
Note: In all the examples below, Imagen 3 is on the left and DALL-E 3 is on the right.
Table of Contents
1. Realistic City Street Test
We started by generating a realistic city street scene to evaluate the models’ handling of lighting and reflections. Here’s the prompt we provided to both models:
Generate a realistic photo of a bustling city street at sunset, with reflections on wet pavement, capturing realistic lighting and shadows.
And here are the results.

Right off the bat, you can see that ChatGPT’s DALL-E 3 struggles to create realistic-looking images. While it manages to generate reflections, the image still feels animated. This holds for all the subsequent prompts as well. DALL-E 3 tends to produce images that appear more animated compared to Imagen 3 or MidJourney.
2. Camera Angle and Shot Composition Test
Next, we wanted to evaluate how well each AI could follow camera angles and shot suggestions. We provided the following prompt to both models:
Generate an image of a dog playing fetch in a park, a low-angle ultra-wide shot with the ball mid-air.

While I like the quality of the Gemini result, ChatGPT’s DALL-E 3 followed the suggestions more accurately, capturing the low-angle camera perspective and ultra-wide shot. Gemini also followed the camera angle suggestion, but overall, ChatGPT performed better maintaining the specified angles and shot compositions.
3. Human Skin Tone Test
Getting human skin tones right is challenging, even for MidJourney which is known for generating realistic images of people but often struggles with close-up shots. To test Imagen 3 and Dall-E 3 capabilities, we provided this prompt:
Generate a close-up portrait of an elderly woman with wrinkles and glasses, in natural lighting with a blurred background.

As expected, ChatGPT’s DALL-E 3 produced an image that looks animated. While Gemini’s result was comparatively better, it was still easy to figure out that the image was AI-generated.
4. Painting Style Test
All 3 previous examples focused on generating realistic images, which didn’t play to DALL-E 3’s strengths. To assess how well both AI image generators can create images in a painting style, we provided this prompt:
Generate a floating island in the sky with waterfalls cascading into the clouds, in the style of a surrealist painting

Both models performed well with this prompt. ChatGPT’s DALL-E 3 created an image with more intricate details and a vibrant shine, whereas Gemini produced a result that felt soft in a more cohesive artistic style. While both had their strengths, the choice between them may come down to a preference for either detailed, sharp imagery (DALL-E 3) or a more blended, dreamlike aesthetic (Gemini).
But Gemini actually followed the prompt better, producing an image that looked more like a painting and successfully depicted waterfalls cascading into the clouds. Whereas it feels like ChatGPT has a style and it likes to stick to it for some reason.
5. Understanding Abstract Concepts
Next, we tested how well the models could interpret abstract concepts. Here’s one example prompt we provided:
Generate an image that shows the feeling of happiness represented as an abstract explosion of colorful swirls and shapes.

It’s very hard to declare a winner in this category, but I personally prefer ChatGPT Dall-E 3’s result. Most of the time, Gemini Imagen 3’s result actually feels opposite to the prompt I provided, but you may have a different opinion.
6. 2D Animation Style and Cartoon Image Generation
We also tested the models’ ability to create images in a 2D animation style and cartoon-like appearance. Here’s an example prompt from our tests:
Generate an image of a 2D animation-style panda character, holding an umbrella in a rainstorm, with raindrops bouncing off the umbrella.
While I expected ChatGPT to excel in this area, I encountered difficulties generating 2D images with ChatGPT right away. Initially, it produced 3D animation-style images, and only after re-prompting did it generate 2D images. This issue occurred multiple times with different examples, so we are considering the 2D animation image it eventually generated after several prompts.

Gemini often generates 2D images with more detail, while ChatGPT tends to transform 2D images into more cartoon-like representations. In the end, the choice between the two depends on your personal preference and the style you’re looking for. We prefer ChatGPT as it looks 2D which is what we prompted.
7. Generating Real-World People
We also tested whether Imagen-3 and Dall-E 3 could generate images featuring real-world people like Elon Musk or Donald Trump. However, both models are unable to generate images of real people. While Gemini immediately states that it cannot create images with real people, ChatGPT initially attempts to generate images in different settings before eventually declaring that it cannot produce images of real individuals.
8. Historical Figures Test
Previously, Gemini’s image generator faced controversies for not generating images of white people. It was generating images of people of color even when prompts like Founding Fathers of America were given. To see how the new model performs, we used the same prompt:
Generate a portrait of a founding father of America.

It appears that this issue has been resolved, as both models produced images that were accurate and true to historical depictions during our tests.
9. Text Rendering Test
We then tested the text rendering capabilities, as many models often produce text that is hard to read or nonsensical. Both Google and OpenAI claim that their models have improved in this area, so we used the following prompt:
Generate an image of a brick wall covered with graffiti, with the word 'TechWiser' in vibrant colors and a grungy style.

In this example, both models rendered the text correctly. However, if the prompt doesn’t specify the exact text, both models still struggle. For instance, with this prompt:
An open book lying on a wooden table, with its pages clearly visible and well-lit. The words should be clear enough to read.

ChatGPT’s DALL-E 3 failed to render the text accurately, producing illegible words, while Gemini deviated from the prompt by making the text on the pages less visible, often obscuring or blurring it.
10. Detailed Prompt Test
Finally, we tested how well both AI image generators follow prompts that include a lot of specific details. Here’s an example of a detailed prompt we used:
Generate an image of a young female warrior with short, silver hair and piercing blue eyes, wearing intricately designed armor made of dark metal with red accents. She is holding a double-edged sword with runes engraved along the blade. A small scar runs across her left cheek. Behind her, a twilight sky fades from deep purple to orange, with the silhouette of a ruined castle in the distance. She stands on a rocky cliff with a black wolf by her side, its eyes glowing in the dim light.

Both models did a good job with this complex prompt, but there were notable differences in how they handled the details. ChatGPT’s DALL-E 3 missed a few elements, such as the scar on the left cheek and the red accents on the armor. Additionally, the character wasn’t depicted as holding the sword as specified.
Gemini captured every detail, including the scar, the red accents, and the precise purple-to-orange gradient of the twilight sky, resulting in a more accurate interpretation of the prompt.
11. In-Paint Editing
ChatGPT can generate images but you can also edit images using it. To edit an image, select the generated image, click on the paint option, and select the part you want to change or edit. Then you can provide a prompt and the changes will appear only in that specific part. For example, here’s the skyline image I have generated with ChatGPT.

But now if I prefer an orange and vibrant sky, I can select the sky part and provide a prompt to make the sky vibrant. Here’s the edited image.

Editing images like this is not possible on Google Gemini yet. Also, Imagen 3 is much slower in generating images compared to DALL-E 3.
Imagen 3 Outperforms DALL-E 3
Imagen 3 excels at generating more realistic-looking images and can adjust the animation style according to the prompt. In contrast, ChatGPT’s DALL-E 3 tends to adhere to its own style, even when different styles are requested. However, ChatGPT has its advantages—it is better at following camera angles and perspectives and can also edit generated images.
Both the AI tools can generate images even in the free version but with limitations like:
- cannot generate images with real people
- daily limit on the number of images that they can generate
Gone are the days when AI-generated images had glaring issues like characters with 10 fingers on one hand. Most images produced by these models are now accurate, making them valuable tools for content creators.