Container isolation — the blast radius is one session
A compromised agent is a compromised container, and every active session gets its own. The container runs as the unprivilegednode user (or your host uid/gid), with no added Linux capabilities, and --rm so nothing persists past exit. Its filesystem view is exactly the mount table: the session folder, the group workspace, and read-only shared code. No host home directory, no .ssh, no Docker socket. Its own container.json and composed CLAUDE.md are re-mounted read-only on top of the writable workspace, so the agent can read its configuration but not rewrite it to grant itself more.
The container is also disposable by design — the host kills it on a stale heartbeat and rebuilds the entire environment at next spawn, so a compromised process doesn’t outlive its session. See Container lifecycle.
No raw credentials in containers
The single most valuable thing to exfiltrate from an agent is an API key, so the invariant is that there are none: secrets live in the OneCLI Agent Vault on the host, and the vault’s gateway injects them into outbound HTTPS requests in transit. Where a tool insists on finding a credential file locally, it gets a stub whose secret value is the placeholderonecli-managed. A compromised agent can dump its environment, its filesystem, and /proc and find nothing worth stealing — it can use the credentials its group was granted (the gateway injects them per request), but it can never extract them to use elsewhere. If the vault is unreachable, the spawn aborts rather than falling back to raw keys.
Setup, per-group secret grants, and the explicit .env opt-out are in Credentials.
Egress lockdown — closing the exfiltration path
Vault injection works throughHTTPS_PROXY, and a proxy env var only governs clients that honor it. A compromised agent could open a raw socket — or run any non-proxy-aware tool — and ship data anywhere on the internet, bypassing credential injection, approvals, and audit. Egress lockdown closes that hole at the network layer: agent containers join a Docker --internal network with no route to the internet, where the vault gateway is the only reachable hop. The agent runs non-root without NET_ADMIN, so it can’t reconfigure its way out, and NanoClaw refuses to spawn at all if lockdown is on but can’t be established.
It’s off by default because it breaks by design: any workflow that needs a non-proxy-aware tool to reach the internet directly will fail. Turn it on when agents face untrusted audiences or hold credentials worth stealing — setup in Hardening.
The mount allowlist — bounding filesystem reach
Mounts define the agent’s entire filesystem reach, so extending them is the most direct way to widen the blast radius. Per-groupadditional_mounts requests are therefore validated against an allowlist at ~/.config/nanoclaw/mount-allowlist.json that is deny-by-default — no allowlist file means no additional mounts — and lives outside the project root precisely so no container can ever reach it. Built-in blocked patterns (.ssh, .aws, .env, id_rsa, and friends) can’t be removed, host paths are resolved through symlinks before checking so the agent can’t alias its way past the rules, and everything is read-only unless both the request and the matching root opt into read-write.
Rules and tooling in Hardening.
Who can talk to your agents
Every layer above limits what a compromised agent can do; this one limits who gets to attempt the compromise. Prompt injection needs a delivery channel, and the cheapest one is just messaging the bot. Three gates stand in the way:- Channel registration — a mention or DM in a chat NanoClaw has never seen doesn’t reach any agent. It escalates to a human (group admins, then global admins, then owners) with an approve/deny card; denied chats drop silently forever.
- Sender policies — each known chat carries an
unknown_sender_policy:strictsilently drops strangers,request_approvalasks an admin before admitting them,publiclets anyone in. Auto-created chats default torequest_approval. - Per-wiring scope —
sender_scope: knownrequires owner, admin, or group membership even in apublicchat. Crucially, messages refused here are never even accumulated as context, so a rejected sender can’t smuggle instructions into the agent’s next batch.
Human-in-the-loop approvals
Some actions are too consequential to leave to a model that might be acting on injected instructions, so they pause for a human. The approvals module delivers a card to an admin DM — group admins first, then global admins, then owners, and no reachable approver means auto-deny — and nothing happens until someone taps Approve:- Credential use — vault secrets can require per-request approval; the gateway holds the HTTPS request open while the card shows an admin the method, host, path, and body preview.
- Self-modification — an agent asking to install packages or add an MCP server into its own container is asking to expand its own capabilities, which is exactly what an attacker would ask for. Both actions queue an approval card.
- Channel registration — new chats reaching agents, as above.
ncl approvals, is in Credentials.
The command gate
Platform slash-commands are classified on the host before any container sees them. Commands that manipulate host-side CLI state (/login, /logout, /config, and friends) are dropped outright, and operational commands like /clear or /upload-trace require an owner or admin role — so a stranger in a public chat can’t wipe an agent’s context or exfiltrate a trace just by typing. The full classification is in Hardening.
Supply chain
The host process itself is in the trusted zone, so its dependencies matter: pnpm is configured with a three-dayminimumReleaseAge (freshly published package versions won’t resolve, which defeats most compromised-maintainer attacks) and an onlyBuiltDependencies allowlist so only four vetted native packages may run install scripts.
What this model doesn’t defend
Be honest about the gap: anything the agent can legitimately do, it can do while manipulated. No layer above distinguishes a sincere action from an injected one — they only bound the action space. A compromised agent can still:- Message anyone in its destinations — including posting attacker-chosen content to your team channel, or leaking conversation context to another chat it legitimately serves.
- Read and destroy its whole workspace — every session of an agent group shares the group folder read-write, including
CLAUDE.local.mdmemory — and it can read whatever its allowlisted mounts expose (read-write only where explicitly granted). - Spend through granted credentials — it can’t extract keys, but it can make authenticated API calls with everything its group was granted (per-request approval, where enabled, is the brake).
- Poison its own memory — instructions written to
CLAUDE.local.mdtoday shape every future session of that group.
Related pages
- Hardening — the how for egress, mounts, sender policies, and the command gate
- Credentials — vault mechanics, stubs, and approval cards
- Container lifecycle — the mount table and the spawn/kill cycle
- Isolation levels — choosing how much your agents share