StartClaude Mythos Cracked Apple M5 Security in Five Days. Apple Spent Five Years Building It.
By Addy · May 17, 2026
Five years. Billions of dollars. Thousands of Apple engineers. Arm's Memory Tagging Extension, repurposed and refined into a hardware-level memory safety system called Memory Integrity Enforcement. MIE was the culmination of everything Apple had learned from watching every public exploit chain against modern iOS get stopped, redirected, or rendered unusable. It was the strongest mass-market memory defense ever shipped on a consumer device.
On April 25, 2026, three researchers at a security firm called Calif sat down with a preview version of Claude Mythos. By May 1, less than a week later, they had a working exploit that bypassed MIE on bare-metal M5 hardware and escalated from an unprivileged local user account to a root shell using nothing but standard system calls.
Five years. Under a week.
That gap is the story.
What Memory Integrity Enforcement Actually Is
Before the exploit makes sense, the defense does.
Memory corruption is the oldest category of software vulnerability. The concept is simple: programs store data in memory. If an attacker can write data to the wrong memory location, they can overwrite the program's instructions and make it do something it was not supposed to do. This class of bugs has been responsible for the majority of serious operating system exploits for four decades. Buffer overflows, use-after-free, heap corruption: every variant is some version of the same underlying problem: a program touching memory it should not be able to touch.
The industry's response has been layered mitigations. Address Space Layout Randomization makes it harder to predict where code lives in memory. Stack canaries detect certain overwrite patterns before they execute. Control Flow Integrity limits which functions can call which other functions. Each layer raises the cost of exploitation without eliminating the underlying vulnerability class.
Apple's Memory Integrity Enforcement is the hardware layer. Built on Arm's Memory Tagging Extension and Apple's Enhanced Memory Tagging Extension work, MIE assigns tags to memory allocations and checks access in hardware. Every pointer carries a matching tag. When a program dereferences a pointer, when it tries to read or write to an address, the hardware checks whether the pointer's tag matches the memory region's tag. If they do not match, the access is blocked before the attacker can turn the bug into a useful primitive.
The analogy is a building with colored keycards and colored doors. Every door has a color. Every keycard has a color. You can only open a door if your keycard's color matches. A memory corruption bug is an attacker who has gotten hold of a keycard, but MIE means the keycard only works on one specific door, not on the kernel's memory, even if the attacker now controls what the keycard says.
Apple spent five years refining this system on Apple Silicon. Its own security research said MIE disrupted every public exploit chain against modern iOS. Security researchers who had published full exploit chains against older iPhone hardware could not simply replay their work against MIE-enabled devices. It was, by the consensus of the security research community, one of the most significant memory safety improvements shipped on a consumer device.
How Mythos Changed the Math
The Calif team's post describes the research timeline with a specificity that is worth taking seriously.
The attack path was discovered accidentally on April 25. Researchers identified the bugs. By May 1, the exploit chain was working. Less than a week from bug discovery to working privilege escalation against the strongest consumer memory defense Apple has shipped.
The split between AI and human contribution is described honestly, which is unusual in this kind of disclosure. Mythos identified memory corruption bug candidates in the macOS kernel. The model reasoned across the codebase and flagged locations where memory handling could produce exploitable states under certain conditions. That is the discovery layer.
The human researchers then designed the bypass technique. Two vulnerabilities, chained with several additional exploit techniques, produced a data-only kernel local privilege escalation chain, one that does not inject shellcode but instead manipulates the kernel's own data structures to gain root access. That distinction matters because data-only exploits are specifically harder to detect and block than traditional code injection. There is no injected code to scan for.
Mythos did not build the exploit alone. The humans built the exploit with Mythos. What changed was the ratio of time spent finding bugs to time spent exploiting them. Finding was the bottleneck. Finding is no longer the bottleneck.
That is the shift. Not that AI can fully automate every exploit chain. It cannot, yet. But the human expertise that was previously spread across the entire research timeline, finding candidates, understanding their exploitability, designing the chain, testing against mitigations, is now concentrated much more heavily in the exploitation design phase. The discovery phase that previously consumed most of the research budget has been handed to the model.
Why This Connects to Everything Anthropic Said in April
On April 7, 2026, Anthropic announced Project Glasswing. The justification for gating Mythos Preview to selected launch partners and additional critical software organizations rather than releasing it broadly was explicit: the model's cybersecurity capabilities were sophisticated enough that broad release could give attackers meaningful capability before defenders had time to patch the vulnerabilities the model could find.
The Calif research is the clearest empirical validation of that concern published to date.
Mythos Preview helped identify bugs in the strongest consumer memory defense Apple has shipped, in Apple Silicon, in under a week, by a small security research team. Calif is not a nation-state threat actor. It is not a well-resourced intelligence agency with decades of accumulated exploit knowledge. It is a small security research firm that got access to a preview model and used it to do something that would previously have required a larger team, significantly more time, and a level of accumulated institutional knowledge that most organizations do not have.
Scale that access. Give the same model to a hundred teams with varying intentions and varying ethical frameworks. The diversity of what they find, and what they do with what they find, is not predictable from the research outputs that responsible disclosure produces.
This is precisely what Anthropic described when it said the model's capabilities were too dangerous for general availability. It is also precisely the capability that Project Glasswing was designed to direct toward defensive purposes: finding vulnerabilities before attackers do, coordinating with platform vendors on patches before public disclosure.
The Calif disclosure followed the Glasswing playbook: discover the vulnerability, report to Apple in person at Cupertino, and hold the technical details pending fixes. Apple's macOS Tahoe 26.5 security notes already include a kernel fix credited to Calif.io in collaboration with Claude and Anthropic Research. Calif says the full 55-page technical report will publish only after Apple ships fixes for the vulnerabilities and attack path.
The process worked. The responsible disclosure framework held. The question that Glasswing was designed to answer, whether AI-assisted vulnerability research can be directed toward defense rather than offense, got a data point in its favor this week.
The Part That Did Not Make the Press Release
There is a specific warning in Calif's post that has not received the attention it deserves.
Calif's point was simple: Apple built MIE before Mythos Preview existed, and the industry is about to learn how the world's best mitigation technology behaves during the first AI-driven vulnerability flood.
Every piece of security infrastructure that exists was designed in a world where vulnerability research moved at human speed. ASLR was designed by engineers thinking about human attackers doing manual memory analysis. Stack canaries were designed by engineers thinking about scripted exploit kits running against known vulnerability patterns. MIE was designed by engineers who had watched five years of iOS exploit chains and built a hardware defense against the techniques those chains used.
None of it was designed for a world where a three-person team with access to a frontier AI model can compress months of discovery work into days.
This is not a criticism of Apple's engineering. MIE is a genuine achievement, and the Calif credit in macOS Tahoe 26.5 reflects a security team that takes these reports seriously. The point is structural: the threat model that informed the design of every current security mitigation assumed human-speed research. That assumption has changed, and the mitigations have not yet caught up.
The security industry has a name for the period after a new offensive capability emerges but before defensive tooling adapts to it: a capability gap. The AI-assisted vulnerability research capability gap is open right now. Glasswing, Daybreak, AI-powered static analysis, and AI-assisted patch development are all responses to it. None of them are fully deployed. All of them are being built faster than they were before Mythos existed, because Mythos is proof that the urgency is real.
The Compression That Changes Everything
The Mandiant figure this publication cited in the Daybreak article is worth revisiting here: 28.3% of CVEs are now exploited within 24 hours of public disclosure.
That number reflects the current state of exploitation speed, the best human attackers, using the best existing tooling, moving as fast as they can. It does not yet fully reflect AI-assisted research pipelines at scale. The Calif research was not about exploitation speed after disclosure. It was about discovery speed before disclosure, the phase that has historically been the rate-limiting step in the research process.
What Mythos compressed was not the time between disclosure and exploit. It was the time between bug existence and bug discovery. A vulnerability in Apple's MIE implementation existed before April 25. It would have been found eventually. The question is by whom and when.
Without Mythos, the timeline for discovering a novel memory corruption bug in a new hardware security architecture on a fresh chip design is measured in months, sometimes years, by researchers with deep platform-specific expertise. With Mythos, the timeline is days, by a small team that was testing the model's capabilities and stumbled onto the attack path accidentally.
The word accidentally is the one that matters. The most significant consumer hardware security bypass of 2026 was not found by a targeted offensive research program. It was found as a side effect of capability testing, in under a week.
What Apple Does Next
Apple's response has been measured and appropriate. macOS Tahoe 26.5 includes at least one kernel fix credited to Calif.io in collaboration with Claude and Anthropic Research. Calif says it is withholding the full exploit report until Apple fixes the vulnerabilities and attack path.
The harder question for Apple is architectural. MIE was the defense layer built to stop memory corruption exploits. An exploit that bypasses MIE on M5 hardware does not just mean a patch needs to ship. It means the threat model that informed MIE's design needs to be revised to account for AI-accelerated research.
What does a memory safety system look like when it is designed against an attacker with AI-assisted discovery rather than human-speed discovery? The answer is probably more aggressive hardware enforcement, shorter patch cycles, and a tighter integration between Apple's own security research and AI-assisted vulnerability scanning.
Apple already runs a significant internal security research program. The upgrade path from where that program is today to one that uses AI-assisted research at the scale Mythos demonstrated is not technically out of reach. The question is whether the organizational investment follows the threat model revision fast enough to stay ahead of the capability gap.
The Glasswing coalition includes Apple. The conversation that needs to happen between Anthropic and Apple's security team, about how MIE's successor should be designed in a world where Mythos can compress discovery timelines from months to days, is already possible. Whether it happens is the more important question than the specific vulnerability that Calif found.
The Bugmageddon Framing
Calif called this a glimpse of what is coming. The framing was not hyperbole for marketing purposes. It was a precise technical statement about what happens when AI-accelerated discovery runs against a software ecosystem that was designed assuming human-speed research.
Every major operating system has vulnerabilities that have not been found yet. Not because they are hidden, but because finding them has required more time and expertise than the research community has collectively applied to looking. That constraint is changing.
The Shai-Hulud npm campaign that this publication covered last week compromised 42 packages in six minutes using AI-aware attack tooling. The Mythos research compressed months of hardware security research into days. The common thread is not a specific model or a specific attack. It is the compression of time as the rate-limiting variable in offensive security research.
Glasswing's bet is that the same compression can be applied to defensive research faster than it can be applied to offensive research. Finding vulnerabilities before attackers do. Coordinating patches before public disclosure. Building the defensive capability gap on top of the offensive one.
The Calif research is both a validation of that bet and a demonstration of what it is racing against. Apple built MIE before Mythos Preview existed. Every piece of security infrastructure in production today was built in that world.
The world has changed. The infrastructure has not caught up yet. And the model that changed it is still in preview, still gated behind Project Glasswing, still priced at $125 per million output tokens, while the open-weight models that are compressing toward its capability are available on Hugging Face under permissive licenses.
The gap between Glasswing's controlled access and the open-weight frontier is the gap that the next six months of AI security research will be fought across. Calif's exploit is the opening move of that fight, conducted responsibly, under disclosure, with Apple's security team in the room.
The next one may not come with a 55-page report hand-delivered to Cupertino.
Sources:
- First public macOS kernel memory corruption exploit on Apple M5 - Calif
- Memory Integrity Enforcement: A complete vision for memory safety in Apple devices - Apple Security Research
- Project Glasswing - Anthropic
- Assessing Claude Mythos Preview's cybersecurity capabilities - Anthropic Frontier Red Team
- About the security content of macOS Tahoe 26.5 - Apple Support
Previously on TheQuery: The npm Compromise Proved AI Is Not Safe From AI and Anthropic Gave Its Dangerous Model to Defenders