Troubleshooting

Start here

The fastest path is the built-in debug skill: open Claude Code in your NanoClaw folder and run /debug. It knows the two-DB session architecture, the log locations, and how to query the session databases directly. If you’re diagnosing by hand, check the logs first. The service writes stdout to logs/nanoclaw.log and stderr (warnings and errors) to logs/nanoclaw.error.log:

tail -f logs/nanoclaw.log          # full routing chain: inbound, spawn/exit, delivery
tail -n 50 logs/nanoclaw.error.log # delivery failures, crash-loop backoff, warnings

Set LOG_LEVEL=debug (see Configuration) to also see resolved mount configurations and streamed container stderr tagged with container=<group folder>. Containers run with --rm, so the host log is the only place container output survives. Then get the system state from the ncl CLI:

ncl sessions list           # container_status: running | stopped
ncl dropped-messages list   # messages the router or access gate refused
ncl wirings list            # which chats route to which agents, and how they engage

Host won’t start

“NanoClaw stopped: update did not go through the supported path” — the upgrade tripwire fired. The code version doesn’t match the marker in data/upgrade-state.json, which usually means you ran a raw git pull instead of /update-nanoclaw. See Upgrading for the recovery path. “Circuit breaker: delaying startup due to repeated crashes” in the log — the host crashed repeatedly and is backing off (up to 15 minutes after six consecutive crashes). It will retry on its own; find the original crash in logs/nanoclaw.error.log (look for FATAL Uncaught exception) and fix that instead of restarting in a loop.

”FATAL: Container runtime failed to start”

On startup the host runs docker info. If it fails, NanoClaw prints this banner and exits:

╔════════════════════════════════════════════════════════════════╗
║  FATAL: Container runtime failed to start                      ║
║                                                                ║
║  Agents cannot run without a container runtime. To fix:        ║
║  1. Ensure Docker is installed and running                     ║
║  2. Run: docker info                                           ║
║  3. Restart NanoClaw                                           ║
╚════════════════════════════════════════════════════════════════╝

Do exactly what it says: start Docker (Docker Desktop on macOS, sudo systemctl start docker on Linux), confirm docker info succeeds, then restart NanoClaw.

Agent never replies

Trace the message through the pipeline, in order:

Did the router accept it?

ncl dropped-messages list

Each row aggregates drops per chat, with a reason:

Reason	Meaning	Fix
`no_agent_wired`	No wiring exists for that chat	Create one: `ncl wirings create --messaging-group-id <id> --agent-group-id <id>`
`no_agent_engaged`	A wiring exists but its engage rules didn’t fire	Check `engage_mode`: `mention` needs an @mention (or a DM); `pattern` matches `engage_pattern` against each message
`unknown_sender_strict`	Sender not recognized, strict policy	Add the sender, or relax the policy — see Hardening
`unknown_sender_request_approval`	Sender not recognized; an Allow/Deny card was sent to an approver’s DM	The approver answers the card in their DM; `ncl dropped-messages list` shows the recorded drop either way

Did a container spawn?

ncl sessions list                      # container_status should be "running"
grep 'Spawning container' logs/nanoclaw.log | tail -5
docker ps --format '{{.Names}} {{.Status}}' | grep nanoclaw

stopped is normal between messages — the sweep restarts the container when due messages arrive. If nothing spawns at all, see Container won’t spawn below.

Did the agent produce a reply?

Inspect the session DBs (inbound.db / outbound.db under data/v2-sessions/<group>/<session>/). The /debug skill walks you through the exact queries. A Container exited non-zero warning in the host log carries the exit code and a stderrTail — the last ~10 lines of container stderr — which is where a boot failure usually shows. A clean or signal-killed exit logs Container exited at info level instead.

Container won’t spawn

The first two of these abort the spawn; a rejected mount only degrades it (the container starts without that mount). All three appear in logs/nanoclaw.error.log:

OneCLI gateway not applied — refusing to spawn container without credentials — the host can’t wire the credential vault, so it refuses to launch the agent. The message stays pending and the sweep retries every minute; fix the vault connection (is OneCLI reachable at the URL in ONECLI_URL in .env? on installs where containers reach the gateway it binds the Docker-bridge IP, not 127.0.0.1) and the spawn recovers on its own. See Credentials.
Egress lockdown errors — with lockdown enabled, the spawn fails fast rather than run with open egress: the "<network>" internal network could not be created or the OneCLI gateway "<container>" could not be attached to "<network>". Check docker network inspect for the egress network and see Hardening.
Additional mount REJECTED — a mount from your group’s container config failed allowlist validation. The log line includes the requestedPath and reason. Fix the allowlist — see Hardening.

Replies are slow

Cold start: the first reply after a container spawn takes 30–60 seconds while the sandbox warms up. Setup tells you this during its ping test; it’s normal.
Scheduled or retried messages wait for the sweep: the host sweep runs every 60 seconds, so a due message for a stopped container can sit up to a minute before the wake fires (Waking container for due messages in the log).
Retries back off: a message reset after a crash retries with exponential backoff (5s, 10s, 20s, …). After five tries you’ll see Message marked as failed after max retries — that message is dead; resend it.

Container killed mid-task

The host sweep kills containers it considers stuck, then resets their in-flight messages to pending:

Killing container past absolute ceiling — the container’s heartbeat file went silent for over 30 minutes. The ceiling stretches automatically when the agent declares a longer Bash timeout, so long-running commands aren’t killed as long as they declare their timeout.
Killing container — message claimed then silent — the container claimed a message and showed no sign of life for over 60 seconds since the claim (also extended by a declared Bash timeout).

Both paths are self-healing: Reset stale message with backoff follows in the log and the work retries in a fresh container. If the same message keeps killing containers, look at the streamed container stderr (LOG_LEVEL=debug) to see what the agent was doing when it died.

Webhook channel is silent

Webhook-based adapters (Slack, Teams, and similar) register routes on a shared local HTTP server. Confirm it’s up and routed:

grep 'Webhook server started' logs/nanoclaw.log   # shows port + registered adapters
curl -i http://localhost:3000/webhook/slack       # 404 "Unknown adapter" = not registered

The server listens on WEBHOOK_PORT (default 3000). If another process already holds that port, the listen fails and the host crashes with FATAL Uncaught exception — WEBHOOK_PORT is read from the process environment only (src/webhook-server.ts:113), so set it in the service definition or your shell, not .env, then restart. Channel-specific failures (tokens, tunnel/public URL, platform-side config) are covered on each channel page: Slack, Teams, Channels overview.

Credential request stuck

If an agent’s API call hangs and then fails, a credential approval card may be waiting. The gateway holds the request open until an admin taps Approve or Reject; with no eligible approver it auto-denies, and unanswered cards expire. Check what’s pending:

ncl approvals list --status pending

The full approval flow — who gets the card, expiry, what happens after a host restart — is in Credentials.

Resetting a session

To wipe a misbehaving session’s conversation state, remove its folder — the host re-provisions both session DBs on the next message:

ncl sessions list                              # find the session and its agent group
rm -rf data/v2-sessions/<group>/<session>/

Getting help

Run /debug in Claude Code — it can read the logs and query the session DBs for you.
Join the Discord for help from other users.
Open an issue if you’ve found a bug.

Get started

Channels

Operate

Build with agents

Extend

Understand

Changelog

Start here

Host won’t start

”FATAL: Container runtime failed to start”

Agent never replies

Container won’t spawn

Replies are slow

Container killed mid-task

Webhook channel is silent

Credential request stuck

Resetting a session

Getting help

​Start here

​Host won’t start

​”FATAL: Container runtime failed to start”

​Agent never replies

​Container won’t spawn

​Replies are slow

​Container killed mid-task

​Webhook channel is silent

​Credential request stuck

​Resetting a session

​Getting help

Start here

Host won’t start

”FATAL: Container runtime failed to start”

Agent never replies

Container won’t spawn

Replies are slow

Container killed mid-task

Webhook channel is silent

Credential request stuck

Resetting a session

Getting help