>_TheQuery
← All Articles

Agent as a Service Has Arrived. SaaS Did Not See It Coming.

By Addy · March 20, 2026

On January 25, 2026, an Austrian developer named Peter Steinberger built an AI agent in roughly an hour. He called it OpenClaw. Within weeks it had become the fastest-growing open source repository in GitHub history.

Then the security problems arrived. CVE-2026-25253 exposed 17,500+ running instances. Researchers found 824+ malicious skills on ClawHub. A researcher hijacked the agent in under two hours. Meta banned it from work devices after an agent accessed an employee's machine without instruction and deleted her emails in bulk.

The capability was real. The trust infrastructure was not.

On March 17 at GTC 2026, Jensen Huang walked onstage in San Jose and announced NVIDIA's answer.


What NemoClaw Actually Is

NemoClaw is not a competing agent framework. It is the enterprise security layer that OpenClaw was missing, installed in a single command, sitting underneath the agent rather than replacing it.

$ curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
$ nemoclaw my-assistant connect

That is the entire installation. What happens underneath is more interesting.

OpenShell, NVIDIA's new open source runtime, wraps every agent in a kernel-level sandbox with deny-by-default network access. An out-of-process policy engine enforces guardrails that a compromised agent cannot override. Every network request, file access, and inference call is governed by declarative policy, written in YAML, versioned in git, and reviewable like code.

The privacy router is the part that matters most for the local-first story. By default, agents run on Nemotron models installed locally on whatever hardware is available - GeForce RTX PCs, RTX PRO workstations, DGX Station, DGX Spark. When a task exceeds local model capability, the router escalates to cloud frontier models through a controlled gateway. The agent never steps outside defined boundaries. The user sees exactly what was routed where.

Jensen Huang described the philosophy from the GTC stage: "OpenClaw is the operating system for personal AI. This is the moment the industry has been waiting for: the beginning of a new renaissance in software."


Local First Is the Architecture, Not the Constraint

Most coverage of NemoClaw focuses on the security story. The more important story is the deployment philosophy underneath it.

The default inference path in NemoClaw is local. Nemotron models run on your hardware. Your data does not leave your machine unless the task specifically requires it and the policy explicitly permits it. The cloud is the exception, not the baseline.

This is a direct inversion of how AI has been deployed for the last three years.

The standard assumption since GPT-3 has been: send everything to the cloud, pay per token, accept that your data travels to someone else's servers. That assumption made sense when local hardware could not run capable models. It made less sense when Qwen3 0.6B started fitting in 522MB. It makes even less sense now that Nemotron 3 Super runs 120B parameters of knowledge at 12B active parameter inference cost.

NemoClaw is the deployment framework for a world where capable models fit on the hardware you already own. Local first. Private by design. Cloud as a last resort.

The practical difference for someone deploying a code review agent, a voice assistant, or a data analysis workflow: your codebase never leaves your machine during the 80% of tasks the local model can handle. Only the genuinely complex tasks - the ones that require frontier reasoning - escalate to the cloud, through a router that logs exactly what was sent and why.


What This Means for Different Agent Types

Code agents: Local Nemotron handles routine code generation, refactoring, and bug fixes on your codebase. Nothing leaves your repository. Cloud escalation happens only for architectural reasoning across large unfamiliar codebases - and that escalation is logged.

Voice agents: Local inference means no audio data routed to external servers for standard interactions. The privacy implication for healthcare, legal, and financial voice applications is significant. HIPAA and DPDP compliance becomes architecturally enforced rather than contractually promised.

Data analysis agents: Sensitive business data - revenue figures, customer records, internal reports - processed locally. The cloud sees only the aggregated question, not the underlying data, when escalation is required.

Workflow agents: Always-on, 24/7 agents running on dedicated local compute without per-token API costs compounding across every action. The economics are fundamentally different from cloud-only deployments.


The SaaS Connection

We published a piece on March 15 about the VC subsidy behind cheap AI - the argument that current AI pricing is underwritten by venture capital and that the bill has not arrived yet. Jensen Huang at GTC made a statement that connects directly to that concern.

"Every SaaS company will become a GaaS company." Governance as a Service. Not software that enables employees to do work - software that does the work through autonomous agents, with governance baked in.

That transition assumes the economics of running agents become predictable and sustainable. Local-first inference is a direct response to that requirement. A SaaS company paying $500/month in API costs to run cloud agents faces an unpredictable cost curve as usage scales. The same company running NemoClaw on local RTX hardware faces a fixed infrastructure cost that does not compound with every agent action.

The architecture and the economics point in the same direction: local first is not a privacy preference. It is a business model.


The Honest Limitations

NemoClaw is in early alpha as of March 16. The GitHub repository is explicit about this - interfaces, APIs, and behavior may change. Real setup issues exist: Docker conflicts, cgroup problems, OOM kills on constrained hardware. The local inference path for Nemotron 3 Super 120B requires approximately 87GB of disk space and NVIDIA GPU hardware.

The security architecture is sound. The production readiness is not there yet. The recommended timeline from people who have actually run it: evaluate now, plan for Q3 2026 production deployment.

It also does not solve every security problem. NemoClaw addresses infrastructure-level security - sandboxing, policy enforcement, data routing. Application-level risks like prompt injection, malicious skills, and agent reasoning manipulation require additional layers that NemoClaw does not provide.

OpenClaw went from a one-hour side project to the infrastructure layer of enterprise AI in less than two months. NemoClaw is the layer that makes that trajectory sustainable. Both are moving faster than most enterprise software adoption cycles can track.


The Pattern That Keeps Emerging

This is the third piece TheQuery has published in two weeks tracking the same architectural convergence.

Nemotron 3 Super on March 11: store knowledge at scale, think at inference efficiency. Parameter Golf on March 18: how small can a capable model actually get. NemoClaw on March 16: local first, cloud as exception, privacy enforced by architecture.

Three independent signals. One direction. The next generation of AI deployment does not default to the cloud. It defaults to the hardware you already own, escalates selectively, and keeps your data where you put it.

Agent as a Service has arrived. The infrastructure to deploy it safely is now a single command away.


Sources:

Previously on TheQuery: The VC Subsidy Behind Cheap AI Will Not Last and The Model That Thinks With 12B Parameters but Knows Everything a 120B Model Knows - the economics and architecture this story builds on.