>_TheQuery
← All Articles

Is Bigger Still Better, or Just IPO Noise?

By Addy · March 28, 2026

Two AI labs. Two frontier models. One week. Both companies heading toward IPOs later this year. Both making claims that sound like they were written for a prospectus rather than a technical audience.

This is worth examining carefully, because the signals underneath the noise are genuinely interesting, and the noise itself tells you something important about where the industry actually is.


What Actually Happened

On March 24, The Information reported that OpenAI finished pretraining a new model codenamed Spud. Sam Altman told employees in an internal memo: "Things are moving faster than many of us expected." He described Spud as a model that could "really accelerate the economy." The company's product division was renamed "AGI Deployment." Sora - the AI video generator shut down less than a year after its debut - had its compute redirected to finish Spud's final training runs.

On March 26, Anthropic accidentally exposed nearly 3,000 internal documents through a CMS misconfiguration. Among them: draft blog posts revealing a model codenamed Capybara, publicly called Claude Mythos. Anthropic confirmed the leak. A spokesperson described Mythos as "the most capable we've built to date." Internal documents describe it as a new tier above Opus with "dramatically higher scores" on coding, academic reasoning, and cybersecurity. One draft warned that Mythos "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders."

Anthropic's leak was accidental. OpenAI's announcement was deliberate. The results were identical: both companies now have an unreleased frontier model in the public conversation weeks before either ships.

Both companies are preparing for IPOs later in 2026.


The IPO Lens

This context matters because it changes how you should read every claim attached to both models.

OpenAI closed a $120 billion funding round at an $840 billion valuation. Losses projected at $14 billion for 2026. The path to a sustainable business requires either a model that justifies the valuation or a story that holds investor confidence long enough for the economics to change. Altman's "accelerate the economy" framing is not technical communication. It is investor communication delivered through an internal memo that was always going to leak.

Anthropic's situation is different but parallel. The Mythos leak was genuinely accidental. But the internal documents that leaked were written for a public audience: polished draft blog posts, benchmark comparisons, carefully constructed positioning. The language describing Mythos as capable of exploiting vulnerabilities "in ways that far outpace the efforts of defenders" is the kind of language that goes into a funding announcement, not an internal technical report.

Neither company is lying. Both companies are operating in a context where every model release is also a valuation argument. That context shapes what gets said, what gets emphasized, and what gets quietly omitted.


What Is Actually Known vs What Is Being Claimed

About Spud:

Known: pretraining is complete as of March 24. Release expected in weeks. Sora compute was redirected to finish training runs. The product organization has been renamed AGI Deployment. Spud is expected to serve as the foundation for OpenAI's desktop super app combining ChatGPT, Codex, and Atlas.

Not known: parameter count, architecture, specific benchmark results, whether it represents a genuine capability leap or an incremental improvement with better marketing. Employees were reportedly told it features "a completely novel capability not seen before" - which is either the most important sentence in this article or the most meaningless, and there is no way to know which without seeing the model.

About Mythos:

Known: it sits in a new tier above Opus. Anthropic confirmed the leak and called it their most capable model to date. Internal benchmarks show higher scores on coding, academic reasoning, and cybersecurity versus Claude Opus 4.6. Cybersecurity researchers are getting early access before general release. The model introduces a new, more expensive pricing tier above Opus.

Not known: the benchmark methodology, what models the cybersecurity claims compare against, the actual parameter count, and whether "far ahead of any other AI model in cyber capabilities" is a measured claim or marketing language that survived into a draft blog post.

The honest read: both models are probably genuinely capable. The specific superlatives attached to both are probably not.


Is Bigger Still Better?

This is the question the week actually raises - and the answer is more complicated than either company's framing suggests.

March 2026 has been the month that complicated the "bigger is better" thesis most thoroughly. Nemotron 3 Super showed 12 billion active parameters outperforming models with 37 billion. Qwen3-Coder-Next showed 3 billion active parameters matching Claude Sonnet 4.5. TurboQuant showed a 6x memory reduction through software alone. Voxtral TTS showed a 4 billion parameter voice model matching the market leader on naturalness benchmarks.

The architectural efficiency thesis has produced real results across multiple domains this month. The models that win are not consistently the largest ones. They are the ones with the best routing, the most efficient memory usage, and the most precise context retrieval.

Into that context, both OpenAI and Anthropic are releasing what appear to be large, expensive frontier models with capabilities that scale with size. Mythos is explicitly described as more expensive to serve than current offerings. Spud required shutting down Sora to free compute. Both are the kind of models that require data center infrastructure to run.

That is not a contradiction. The architectural efficiency work and the frontier scaling work are solving different problems. Efficient small models handle the 80% of workloads that do not require frontier reasoning. Frontier models handle the 20% that do. Both matter. Both have a market.

The question is whether the frontier claims attached to Spud and Mythos will hold up when the models ship - or whether this is a week where the IPO timeline and the capability timeline happened to overlap, producing announcements that are larger than the underlying reality.


The Organizational Signals Are More Interesting Than the Model Claims

The most revealing thing about OpenAI this week is not Spud. It is the organizational moves around Spud.

Sora shut down. A product with a billion-dollar Disney deal tied to it gone, with the compute it required unavailable alongside the new model. That is a resource constraint, not a strategic vision.

The product organization is renamed AGI Deployment. The safety team is repositioned inside the research organization rather than above it. These moves happen in the same week as the model that employees are told features "a completely novel capability not seen before."

OpenAI is making a bet: that the capability jump from Spud justifies the organizational restructuring required to ship it on IPO timeline. That bet may be correct. The signals that it is being made for financial rather than technical reasons are real and worth naming.

Anthropic's accidental leak is, paradoxically, a better signal. You learn more about a company from what it writes when it does not expect anyone to read it than from what it publishes deliberately. The internal documents reveal a company genuinely concerned about the dual-use risks of its own model - one draft explicitly warned that Mythos could accelerate a cyber arms race against defenders. That language does not end up in accidental leaks unless the concern is real.


What This Week Actually Tells You

Two frontier models are coming. Both are probably capable. Neither has shipped benchmarks you can evaluate independently. Both companies have strong financial incentives to describe them in the most impressive terms possible before their IPOs.

The honest position is to wait. Not because the models will disappoint - they may not - but because the only claims worth taking seriously about any AI model are the ones that survive contact with independent evaluation.

The IPO context does not make Spud or Mythos bad models. It makes every unverifiable superlative attached to them less meaningful. "Accelerate the economy" and "step change" and "far ahead of any other AI model" are claims that will be tested when the models ship. Until then, they are investor communication.

The architectural efficiency work that TheQuery has been tracking all month - Nemotron, TurboQuant, Voxtral, Gimlet Labs - produced claims that were testable immediately and held up under testing. That is a different standard than what both companies offered this week.

Bigger may still be better for the problems that require frontier inference. Whether Spud and Mythos represent genuine capability leaps or carefully timed announcements is a question that ships with the models.


Sources:

Previously on TheQuery: The Model That Thinks With 12B Parameters but Knows Everything a 120B Model Knows and The VC Subsidy Behind Cheap AI Will Not Last - the efficiency and economics context this story sits inside.