// Reading nowStart

Open Source AI Is No Longer a Side Project

By Addy · April 3, 2026

Google launched Gemma 4 on April 2. The coverage described it as a model family. The model is not the story.

The story is that the companies with the most to lose from open source are now its loudest advocates. And they are not doing it for altruistic reasons. They are doing it because closed, expensive, cloud-dependent AI is starting to look like a liability.

What Google Actually Shipped

Gemma 4 comes in four sizes: Effective 2B, Effective 4B, a 26B Mixture of Experts, and a 31B dense model. All four are available immediately on Hugging Face, Kaggle, and Ollama under an Apache 2.0 license. The 31B currently sits third on the Arena AI open-model leaderboard. The 26B sits sixth. Both outperform models many times their size.

The technical choices are worth noting. The larger models carry a 256,000-token context window. The smaller edge models support 128,000 tokens and add native audio input. All four support native function calling and structured JSON output, meaning developers no longer need to retrofit their applications to get the models to interact with external tools. Earlier Gemma versions required that extra work. Gemma 4 removes it.

The E2B model runs under 1.5 gigabytes of memory via Google's LiteRT-LM runtime. That is a Raspberry Pi running a capable multimodal model. That is also a developer in a country with expensive cloud costs shipping a production AI application without paying per-token fees to any platform.

But none of that is what makes this launch significant.

The License Change Is the Real Headline

Previous Gemma releases shipped under a custom Google license. It restricted certain commercial uses, imposed content policies, and created legal uncertainty for enterprises building products on top of the models.

Gemma 4 ships under Apache 2.0.

Apache 2.0 means no monthly active user caps. No acceptable-use enforcement. No restrictions that require a legal team to review before deployment. Full commercial freedom. The same license that Qwen 3.5 from Alibaba uses. More permissive than the license Meta uses for Llama 4.

Hugging Face co-founder Clement Delangue called it "a huge milestone." That is not marketing language. For enterprises building internal tools, for governments managing sensitive data, for developers in markets where cloud dependence is a strategic risk, the license is the product. Gemma 4's capabilities matter. Its legal status matters more.

Google spent years building Gemma as an open source model with a proprietary soul: technically available, commercially restricted. Gemma 4 is the first generation where that contradiction resolves.

The Chinese Lab Problem Google Is Not Saying Out Loud

The benchmark picture for Gemma 4 is strong but not dominant.

The 31B scores 89.2% on AIME 2026, 84.3% on GPQA Diamond, and 80% on LiveCodeBench v6. Those are serious numbers for a model in that size range. And Gemma 4 still trails Qwen 3.5 from Alibaba, GLM-5 from Zhipu AI, and Kimi K2.5 from Moonshot AI - all open-weight models from Chinese labs - though not by large margins.

The gap to GPT-OSS-120B from OpenAI is large and favorable. Google's launch materials highlighted that comparison prominently. The comparison to the Chinese models appeared in smaller print.

This is the context Google is navigating. The open-weights frontier is no longer set by American labs. Chinese research teams are releasing capable models under permissive terms at a pace that has shifted the baseline assumptions of the entire ecosystem. Gemma 4 is Google's answer to that shift: a domestic, enterprise-grade alternative with a clean legal profile and without the data sovereignty concerns that come with routing workloads through foreign infrastructure.

The Register noted this directly: Google is offering enterprise customers a domestic alternative to Chinese open-weight models, but one that will not use sensitive corporate data to train future models. That framing - sovereignty, trust, data control - is the sales pitch underneath the benchmark numbers.

This Is Not the First Signal. It Is the Loudest.

To understand what Gemma 4 represents, it helps to look at what came before it.

In March, NVIDIA shipped Nemotron 3 Super: a sparse Mixture of Experts model that activates 12 billion parameters per token while carrying 120 billion total. It matched models three times its active size on reasoning benchmarks. The architecture uses a hybrid Mamba-Transformer backbone that reduces memory overhead without trading away accuracy. NVIDIA released the weights, the training data, the recipes, and the fine-tuning pipelines. Not just the model: the complete methodology.

That was not typical behavior for a company whose business is selling hardware. It was a deliberate signal: that the future of AI infrastructure involves open models running efficiently on distributed hardware, and NVIDIA intends to be the infrastructure layer underneath all of it.

Google's move with Gemma 4 is the same signal from a different position. Both companies are making the same bet - that open, efficient models running on owned hardware are where enterprise AI is going - and acting on it before the market fully confirms it.

The pattern is legible now. Closed frontier models dominate the top of the capability chart. Open efficient models are eating the bottom and middle, which is where most real-world AI workloads actually live. The companies that understand this are building for both simultaneously.

Efficiency Is Not a Consolation Prize

There is a persistent assumption in AI coverage that open and efficient models are for developers who cannot afford the real thing. That assumption is wrong, and the data from the past six months makes it harder to defend.

A 31B parameter model ranking third among all open models globally is not a compromise. It is an architectural achievement. The techniques behind it - sparse expert routing, dual rotary position embeddings, shared key-value caching, alternating local and global attention - are not cost-cutting measures. They are engineering choices that produce more capable models per unit of compute consumed.

This matters beyond benchmarks. The AI industry's current cost structure is not stable. Training runs consume electricity at a scale that is already drawing regulatory attention in multiple jurisdictions. Inference at scale costs more than most public pricing reflects, with the gap filled by VC subsidy that will not hold indefinitely. The models that survive the next phase of the industry are the ones that produce good results cheaply enough to run profitably.

Efficiency is not a feature. It is a survival condition.

The E2B model running under 1.5 gigabytes is not a demo for edge developers. It is a preview of what sustainable AI deployment looks like: capable enough to be useful, cheap enough to run without a funding round behind it.

What This Means for Builders

Three things follow from this for developers and companies making infrastructure decisions today.

The first is that the open-model ecosystem is now a serious option for production workloads. Gemma 4's Apache 2.0 license removes the legal ambiguity that made previous open models hard to deploy in regulated industries. A hospital, a bank, a government agency can now run Gemma 4 on their own infrastructure, modify it for their domain, and deploy it without a compliance review triggered by licensing uncertainty. That was not true six months ago.

The second is that on-device and on-premises deployment is moving from a niche preference to a mainstream option. The E2B and E4B models are the foundation for Gemini Nano 4, Google's next on-device model for Android. Code written for Gemma 4 today will run on Gemini Nano 4-enabled devices when they ship later this year. Developers building on Gemma 4 are not building for a limited experiment. They are building for a deployment target that will reach billions of Android devices.

The third is that the open-source AI race has changed who is setting the pace. Meta and Mistral were the reference points for open-weights development a year ago. Today the models setting benchmarks are coming from Alibaba, Zhipu AI, and Moonshot AI. Google and NVIDIA are responding. The ecosystem that emerges from this competition will look different from the one that existed when Llama 2 was the ceiling.

The Uncomfortable Benchmark Truth

Gemma 4 is excellent for its size. It is not the best open model available.

That gap matters less than the trajectory. Gemma 3's BigBench Extra Hard score was 19.3 percent. Gemma 4's 31B model scores 74.4 percent on the same benchmark. That is not incremental progress. That is a signal that the efficiency-focused approach - smaller models, smarter architectures, better training pipelines - is compounding faster than raw parameter scaling.

The companies winning the open-source race are not winning by making the biggest model. They are winning by making the most capable model that a developer can actually afford to run. That competition has no ceiling, and it is accelerating.

Sources:

Gemma 4: Byte for byte, the most capable open models - Google DeepMind
Google's bold Gemma 4 bet targets Meta's hold on developers - Rolling Out
Google battles Chinese open weights models with Gemma 4 - The Register
What Is Google Gemma 4? Architecture, Benchmarks, and Why It Matters - WaveSpeed AI
Google launches Gemma 4: four open-weight models - TNW
Announcing Gemma 4 in the AICore Developer Preview - Android Developers Blog
Gemma 4: Google's New Open Source LLMs Lag Behind Chinese Competitors - Trending Topics

Previously on TheQuery: The Model That Thinks With 12B Parameters but Knows Everything a 120B Model Knows and The VC Subsidy Behind Cheap AI Will Not Last