The Image That Doesn't Look Like AI Anymore

By Addy · April 22, 2026

Two years ago, you could spot an AI-generated image by looking at the menu.

Not the composition, not the lighting, not the faces. The menu. Every AI image model that tried to render text inside an image produced something that looked like it had been typed by someone having a medical episode. "Churiros." "Burrto." "Enchuita." The content was plausible. The text was a giveaway every time.

OpenAI shipped ChatGPT Images 2.0 on April 22. When asked for a Mexican restaurant menu, it produces something a real restaurant could print and hand to customers without anyone looking twice. The text is correct. The layout holds. The prices are readable. The only thing that might raise an eyebrow is ceviche at $13.50 -- but that's a menu decision, not a model failure.

That shift -- from obviously artificial to functionally indistinguishable -- is not a small update. It is a capability threshold that opens the door to an entirely different set of industries.

What Images 2.0 Actually Ships

ChatGPT Images 2.0 runs on the new gpt-image-2 model and is available today across ChatGPT, Codex, and the API. All ChatGPT users can access it starting Tuesday. Paid users get access to the more capable Thinking mode.

The model operates in two modes. Instant generates quickly and is suitable for most creative and production tasks. Thinking is slower and more deliberate -- it reasons through the request, can search the web during generation, and double-checks its own outputs before returning a result. Thinking mode can also generate up to eight distinct images from a single prompt, with characters, objects, and styles held consistent across all scenes.

The technical specs: outputs up to 2K resolution, flexible aspect ratios from 3:1 to 1:3, multi-panel generation, and support for non-Latin scripts including Japanese, Korean, Chinese, Hindi, and Bengali -- a gap that made previous models functionally unusable for large parts of the global market.

API pricing runs $0.211 per high-quality 1024x1024 image. At larger formats -- 1024x1536 -- it is actually cheaper than its predecessor at$ 0.165, compared to $0.20 for `gpt-image-1.5`. Low-quality outputs drop to$ 0.006. The model's knowledge cuts off at December 2025.

The one thing that did not ship: OpenAI declined to specify what kind of model architecture is powering Images 2.0. The company confirmed it has "thinking capabilities" but left the underlying architecture uncharacterized.

The Text Problem Is Solved

Every previous generation of AI image models had the same structural weakness: text inside images was unreliable. Logos would have letters transposed. Signs would contain words that almost existed. Product labels would have ingredients that read correctly from a distance and made no sense up close.

This was not a stylistic limitation. It was an architectural one. Diffusion models, which dominated image generation for years, do not process language the way language models do. They approximate visual patterns from training data. Text is a visual pattern, but an unusually precise one -- a single wrong pixel in a letter changes its meaning. The models were not built to care about that distinction.

Images 2.0 reaches 95%+ text rendering accuracy by benchmark. The practical difference is immediate: signs read correctly, product labels hold their text, UI mockups display readable copy, watch faces show the time described in the prompt, comic book speech bubbles contain actual sentences.

For the industries where text accuracy was the blocking issue -- advertising, packaging design, editorial graphics, restaurant menus, wayfinding -- the blocker is gone.

The Industries That Change First

Advertising and marketing. A creative team producing a campaign across twelve markets and four languages previously needed human designers to recreate each asset with correct local text. Images 2.0 generates localized advertising assets with accurate multilingual text in a single pass. The campaign that took a week of design work now takes an afternoon of prompt iteration. OpenAI specifically lists localized advertising as a target use case.

E-commerce and product photography. Product images with accurate text on labels, logos, and packaging are now generatable at scale. A brand managing thousands of SKUs can generate product photography with readable ingredient panels, correct color palettes, and consistent logo reproduction across the entire catalog. This directly competes with commercial product photography studios for catalog-level content.

Publishing and editorial. The Thinking mode's eight-image batch generation with cross-frame consistency unlocks page-long manga from a single picture and text prompt, series of social media graphics with consistent visual language, and illustrated editorial content that previously required an illustrator's time across multiple sessions.

UI and product design. Browser windows, mobile screens, SaaS dashboards, and app mockups now generate at a quality that communicates product flows before anyone opens Figma. This is not replacing Figma -- it is replacing the whiteboard phase that comes before it. Teams can visualize and debate product direction visually before a single design file is created.

Education and infographics. Infographics with legible data labels, educational diagrams with accurate text annotations, and illustrated explainers now generate reliably. The combination of web search during generation and accurate text rendering means the model can pull current data and render it visually in a single step.

Thinking Is the New Architecture

The more significant shift in Images 2.0 is not the improved photorealism. It is the reasoning layer.

Previous image models received a prompt and generated output. The relationship was direct: input text, output pixels. The model had no awareness of whether its output was correct, no ability to verify claims in the prompt, and no mechanism for iterating on its own work before returning a result.

Images 2.0's Thinking mode changes this sequence. The model reasons through the request before generating, can search the web to verify facts or pull current information, generates the image, and checks the output against the prompt before returning it. The workflow that previously required a human in the loop -- prompt, generate, check, reject, reprompt -- now has an internal loop that runs before the output reaches the user.

This is the same architectural shift that defined the move from GPT-4 to reasoning models in text generation. The model is not just pattern-matching against training data. It is planning, generating, and evaluating. Images are now part of that chain.

The practical consequence: tasks that were previously impossible to automate -- "generate a marketing image that references this week's news and includes our logo in the correct position" -- are now pipeline-able. The model can search, compose, render, and verify in a single invocation.

The Photorealism Question

There is an uncomfortable side of this capability that is worth naming directly.

Images 2.0 produces images that are, in demonstrable cases, indistinguishable from photographs. Early testers reported that portraits fooled colleagues on first inspection -- skin texture, lighting, reflections, hands rendered correctly. The "AI look" -- the oversmoothed skin, the perfect light, the slightly uncanny composition that marked every previous generation -- is largely gone.

OpenAI ships the model with watermarking and content policies. These are real constraints and not merely performative. But watermarks can be removed, and content policies are enforced at the API level, not at the pixel level. A model that can produce convincing fake photographs is a different kind of tool than one that produces obvious AI art, and the uses people find for it will reflect that difference.

This does not make the model bad or the launch irresponsible. It makes the threshold real and worth acknowledging. The same capability that makes a restaurant menu indistinguishable from a real one makes a fake photograph of a real person harder to detect. Both things are true simultaneously, and only one of them makes it into most coverage of today's launch.

Where This Sits in the Competitive Picture

Google's Nano Banana Pro generated over one billion images in 53 days when it launched earlier this year. Midjourney holds its aesthetic quality advantage for artistic work. Adobe, Figma, and Canva have already been named as integration partners for Images 2.0 -- Figma and Adobe stocks dropped two percent on the pre-announcement.

The competitive frame is not "which model produces more beautiful images." It is "which model produces images that can be used in production workflows without human correction." By that measure, Images 2.0 claims the lead -- specifically because of text rendering, multilingual support, and the reasoning loop that reduces the iteration required to get to a usable output.

Midjourney makes art. Images 2.0 makes assets. Those are different products for different buyers, and the asset market is larger.

The New Default

The question the creative industry has been asking for three years is: when will AI image generation be good enough to use in production without a designer's intervention?

For the use cases where text accuracy was the binding constraint -- which is most commercial design work -- the answer as of today is: now.

That does not mean designers are unnecessary. It means the role shifts. The senior creative director who spent three hours supervising a junior designer's logo placement on a hundred product images will spend those three hours on something that cannot be automated yet. What that something is will define the next phase of creative work.

The image that doesn't look like AI anymore is not the end of the story. It is the beginning of a different one -- about what human creative judgment is actually for, now that the mechanical parts of the job are handled.

Sources:

ChatGPT's new Images 2.0 model is surprisingly good at generating text - TechCrunch
OpenAI Claims ChatGPT Images 2.0 Can Think - PetaPixel
ChatGPT Images 2.0 debuts with reasoning-driven generation, 2K output - Interesting Engineering
OpenAI's ChatGPT Images 2.0 thinks before it generates, adding reasoning and web search to image creation - The Decoder
OpenAI's ChatGPT Images 2.0 is here and it does multilingual text, full infographics, slides, maps, even manga — seemingly flawlessly - VentureBeat

Previously on TheQuery: The Race Has No Finish Line. Claude Opus 4.7 Just Proved It. and Voice AI Stopped Being a Demo This Week