>_TheQuery
// Reading nowStart
← All Articles

Gemini Omni and Gemini 3.5 Flash Are Real. So Is the Thing Google Refused to Ship.

By Addy · May 20, 2026

Sundar Pichai walked onto the Google I/O stage with the most ambitious AI announcement slate the company has ever prepared. Two new model families. A unified multimodal architecture that collapses video, image, audio, and text into a single reasoning system. A Flash model that outpaces every comparable frontier model at a fraction of the cost. A redesigned Search that behaves less like a database and more like an agent. 900 million monthly active Gemini users.

The audience groaned anyway.

The groan came when Pichai mentioned Gemini 3.5 Pro -- the flagship model in the new family, the one the developer community had been anticipating since the 3.5 name was first spotted in API traces weeks ago. "I know you can't wait to get your hands on it," Pichai said. "Give us until next month."

The audience had given Google until I/O. Google asked for more time.

That moment -- a crowd that received genuinely impressive news and still left wanting -- is the most honest description of where Google stands in May 2026. The products that shipped are real and competitive. The product the room came for did not ship. And the most powerful capability inside what did ship was deliberately held back.

This is the Google I/O 2026 story that the benchmark tables do not tell.

Gemini 3.5 Flash: The Model That Actually Shipped

Start with what is real and available.

Gemini 3.5 Flash is Google's strongest agentic and coding model and it is generally available right now -- in the Gemini app, in AI Mode in Search, in Google Antigravity for developers, in Vertex AI and AI Studio for enterprise. Billions of people have access to it starting at launch. That distribution scale is not a feature. It is the product.

The benchmark numbers Google published are specific enough to take seriously. On GPQA Diamond -- the graduate-level scientific reasoning benchmark that tests PhD-level knowledge across biology, chemistry, and physics -- Gemini 3.5 Flash scores 90.4%. On MMMU-Pro, the multimodal reasoning benchmark, it scores 81.2%. On SWE-bench Verified, the standard agentic coding evaluation, it scores 78% -- outperforming Gemini 3.1 Pro on a benchmark where the Pro model was supposed to hold the lead.

The speed claim is the one Google is leading with: 4x faster than comparable frontier models, at less than half the cost. On Terminal-Bench 2.1, GDPval-AA, and MCP Atlas -- the three benchmarks most relevant to agentic workflows -- 3.5 Flash outperforms Gemini 3.1 Pro. A Flash model beating a Pro model on the benchmarks that matter for production agent deployments is the headline that the model family naming convention obscures. Flash was supposed to be the fast, cheap tier. Gemini 3.5 Flash is the fast, cheap tier that outperforms the previous generation's expensive tier.

This is the same dynamic this publication identified in Qwen3.6-27B matching Claude Opus 4.5 on reasoning benchmarks, in DeepSeek V4-Flash tying frontier models on standard software engineering at a fraction of the cost. The efficiency research is compressing what was once a capability gap between speed-optimized and quality-optimized models into something that no longer justifies a 10x price difference. Google's own model family is the latest proof.

The context window is unchanged from Gemini 3.1 -- one million tokens, with the same retrieval accuracy guarantees that made 3.1 Flash Lite the preferred choice for long-context enterprise deployments. Native multimodal input across text, image, audio, and video is built in. Function calling, structured output, and MCP server integration are all supported. For developers who have already built on the Gemini API, upgrading to 3.5 Flash requires a model name change and nothing else.

Gemini Omni: The Architecture That Changes What Video Generation Means

The more significant announcement at I/O is not the model that beat the benchmarks. It is the model that changed the category.

Gemini Omni is the culmination of three years of work that Google announced when it first described Gemini as a natively multimodal model. When Google launched Gemini in December 2023, the stated goal was a single neural network trained on text, image, audio, and video simultaneously -- a model that understood the world through multiple senses rather than text alone. The follow-through took two years of iteration through Veo, Nano Banana, Genie, and a dozen intermediate research releases.

Gemini Omni is what that goal looks like in production.

The architectural claim Google is making is that Omni does not stitch inputs together. It reasons across them. A user who uploads a photograph, adds a piece of audio, types a description, and asks for a video output is not giving the model four separate inputs that get processed and combined. They are giving the model one unified context that the system reasons across before generating output. The distinction matters because stitching produces outputs that feel assembled. Reasoning across produces outputs that feel coherent.

The first model in the family -- Gemini Omni Flash -- is rolling out to Google AI Plus, Pro, and Ultra subscribers through the Gemini app and the Flow creative tool. YouTube Shorts and YouTube Create App users get access at no cost. Developer and enterprise API access follows in the coming weeks.

The capabilities confirmed at launch: video generation from any combination of text, image, audio, and existing video. Conversational editing through natural language commands. Digital avatar creation -- a video representation that looks and sounds like the user, built through a structured onboarding process that requires recording yourself and speaking a series of numbers aloud. Text rendering inside generated video, which Google specifically highlighted for advertising use cases where brand accuracy in generated content is a hard requirement. SynthID watermarking on every output, making every Omni-generated video cryptographically verifiable as AI-generated.

One number reveals the deployment reality: Flash clips are capped at 10 seconds. Google's product management director Nicole Brichtova described this explicitly as a deployment decision rather than a model constraint. The model can generate longer clips. Google has decided not to let it yet. That distinction -- between what the model can do and what Google will allow it to do -- is the thread that runs through the entire I/O announcement.

The Thing Google Refused to Ship

Two absences from the announcement define what I/O 2026 actually means.

The first is Gemini 3.5 Pro. Google confirmed it exists. Google confirmed it is being used internally. Google confirmed it performs better than 3.5 Flash on every dimension. The reason it did not ship at I/O is not technical -- it was ready enough for internal use. The reason is that Google wanted to widen access while managing compute demand, and 3.5 Pro at scale requires infrastructure that is not yet ready for the volume I/O would trigger on day one.

Pichai's "give us until next month" is a supply constraint, not a capability gap. That is a different kind of disappointment than a model that is not ready. It is a model that is ready and a company that cannot yet serve the demand shipping it would create. The groan was accurate.

The second absence is more deliberate and more important.

Speech editing inside existing videos -- the capability that would let a user modify what a real person says in an existing recording, with the audio seamlessly matching the speaker's voice -- was specifically held back at I/O. Brichtova told TechCrunch that this feature remains limited "until we feel like we're at a point where we can release it responsibly."

This is the third major AI capability this month that a major lab has built and declined to ship at full power. Anthropic held back Mythos Preview behind Project Glasswing because its cybersecurity capabilities were too dangerous for broad release. Microsoft pulled VibeVoice's TTS synthesis, redesigned it with watermarks and disclaimers, and reshipped it. Google built voice editing in existing video into Gemini Omni and decided not to enable it at launch.

The pattern is not coincidence. It is the industry's response to a specific threat model: voice editing of real people's existing recordings is the most direct path to deepfake audio that sounds like a specific individual. The digital avatar onboarding -- recording yourself speaking numbers aloud before the system will generate a video of you -- is the same logic applied to video. You cannot create a video avatar of another person because the system requires you to record yourself. The authentication barrier is the product. The capability exists. The gate exists because the capability without the gate is a deepfake generator.

What Gemini Omni Actually Competes With

The video generation market Google entered at I/O is not the same market Veo entered a year ago.

Seedance 2.0 from ByteDance currently leads most public video generation benchmarks with over 90% commercial usability. OpenAI shut down the Sora 2 consumer app in April, leaving the model available only through API access. Luma AI is building agentic video tools that can generate entire ad campaigns from a product image and a brief. The market has moved from "generate a short video from a text prompt" to "generate a campaign-ready asset from a strategic brief" in twelve months.

Gemini Omni's competitive positioning is not primarily about video quality. Any competent video model can generate a 10-second clip from a text prompt. Omni's positioning is about distribution and integration. The model ships into the Gemini app that 900 million people already use. It ships into YouTube Shorts, where user-generated content meets the largest video platform on Earth. It ships into Flow, Google's creative studio. It ships into Search, which processes more queries per day than any other information system in history.

ChatGPT Images 2.0 made the same distribution bet -- ship into the app that already has the users rather than building a standalone product that needs to find its own audience. Google is making the same bet at larger scale with a harder-to-replicate distribution surface. Gemini Omni will be used more than Sora not because it is better than Sora, but because it is already inside the product people open to search for things.

The advertising application Google highlighted is the commercial use case with the clearest near-term value. A brand that can generate product videos with accurate text rendering, consistent branding elements, and the right physics for how its product moves -- without a production studio, without actors, without a two-week post-production cycle -- is a brand that has just restructured its content creation budget.

The Efficiency Compression Google Is Walking Into

The I/O announcement did not happen in isolation.

Cursor shipped Composer 2.5 this week -- a model that matches Gemini 3.1 Pro on several agentic coding benchmarks at 0.50permillioninputtokens,builtonanopenweightKimiK2.5basewith850.50 per million input tokens, built on an open-weight Kimi K2.5 base with 85% of compute invested in post-training. DeepSeek V4-Pro is available at 1.74 per million input tokens with competitive programming performance that beats every closed model. Qwen3.6-27B matches Claude Opus 4.5 on reasoning benchmarks at a fraction of the cost.

Gemini 3.5 Flash's competitive positioning -- faster than frontier models, cheaper than frontier models, more capable than the previous Pro tier -- lands in a market where "faster and cheaper than frontier" is increasingly the baseline rather than the differentiator. The efficiency compression that has been running through every Chinese open-weight lab and every vertically-integrated IDE model for six months has not stopped because Google shipped a capable Flash model.

Gemini 3.5 Pro, when it ships next month, will be the real test of whether Google has a frontier capability advantage that justifies the distribution at scale it is building. 3.5 Flash winning against 3.1 Pro on agentic benchmarks is a meaningful result. The test that matters is whether the Pro model that does not yet exist in public hands demonstrates a genuine capability lead over what DeepSeek, Kimi, and Claude Opus 4.7 already offer.

The audience that groaned understood this. They were not ungrateful for 3.5 Flash. They came to find out whether Google has the frontier model. That question has been deferred to next month.

What Google I/O 2026 Actually Means

The honest read of the announcements is this: Google is building the distribution layer for AI faster than any other company on Earth, and it is doing so with models that are genuinely competitive without being definitively dominant.

900 million Gemini monthly active users is the number that matters more than any benchmark. Gemini 3.5 Flash shipping into Search, Android, YouTube, and Workspace on day one means the model reaches more users in its first week than most frontier models reach in their first year. The distribution advantage is structural and compounding -- every user who encounters Gemini in Search or YouTube without choosing to is a user who has been onboarded to the platform without a conversion funnel.

Gemini Omni consolidates Google's fragmented generative media strategy -- Veo, Nano Banana, Genie -- into a single branded surface that developers can build on and users can reach from the apps they already use. A developer who was previously choosing between Veo for video and Nano Banana for images and a separate TTS provider for audio now has one model, one API, one billing relationship.

What Google refused to ship -- 3.5 Pro and voice editing in Omni -- will define the next chapter. If 3.5 Pro arrives next month and demonstrates a genuine frontier capability advantage, I/O 2026 will look like a deliberate runway-building exercise that set up the real announcement. If 3.5 Pro arrives next month and is competitive but not dominant, the groan from the I/O audience will have been the accurate read all along.

The products that shipped are real. The thing the room came for did not ship. Both of those sentences are true, and the distance between them is where Google's next month will be decided.

Sources:

Previously on TheQuery: The Image That Doesn't Look Like AI Anymore and The Open Source AI Race Is No Longer a Side Project