Troubleshooting - NanoClaw

This guide covers common issues you may encounter when running NanoClaw and how to resolve them.

Quick status check

Run these commands to get a quick overview of system health:

# 1. Is the service running?
launchctl list | grep nanoclaw
# Expected: PID  0  com.nanoclaw (PID = running, "-" = not running, non-zero exit = crashed)

# 2. Any running containers?
docker ps --format '{{.Names}} {{.Status}}' 2>/dev/null | grep nanoclaw

# 3. Any stopped/orphaned containers?
docker ps -a --format '{{.Names}} {{.Status}}' 2>/dev/null | grep nanoclaw

# 4. Recent errors in service log?
grep -E 'ERROR|WARN' logs/nanoclaw.log | tail -20

# 5. Are channels connected? (look for last connection event)
grep -E 'Connected|Connection closed|connection.*close' logs/nanoclaw.log | tail -5

# 6. Are groups loaded?
grep 'groupCount' logs/nanoclaw.log | tail -3

Agent not responding

Symptoms

Messages sent to the agent receive no response.

Diagnosis

Check if messages are being received

grep 'New messages' logs/nanoclaw.log | tail -10

If no messages appear, the issue is with channel connectivity.

Check if containers are being spawned

grep -E 'Processing messages|Spawning container' logs/nanoclaw.log | tail -10

If messages are received but no containers spawn, check trigger patterns.

Check for active containers

docker ps --filter name=nanoclaw-

If containers are stuck running, they may have hit an infinite loop.

Check container logs

docker logs nanoclaw-{group}-{timestamp}

Look for errors in the agent execution.

Solutions

Trigger word not matching

Verify the trigger word in your message matches the configured trigger:

sqlite3 store/messages.db "SELECT name, trigger_pattern FROM registered_groups;"

The trigger is case-insensitive and must appear at the start of the message.

Container runtime not available

Ensure Docker is running:

docker info

If Docker is not running, start it:

macOS: Open Docker Desktop
Linux: sudo systemctl start docker

Queue at concurrency limit

Check if the queue is blocked:

grep 'concurrency limit' logs/nanoclaw.log | tail -5

Stop stuck containers:

docker stop -t 1 $(docker ps -q --filter name=nanoclaw-)

Container timeout

Symptoms

Logs show Container timeout or timed out
Exit code 137 (SIGKILL) in container logs
Messages are lost after timeout

Diagnosis

# Check for recent timeouts
grep -E 'Container timeout|timed out' logs/nanoclaw.log | tail -10

# Check container log files
ls -lt groups/*/logs/container-*.log | head -10

# Read the most recent container log
cat groups/{group}/logs/container-{timestamp}.log

Solutions

Increase timeout for specific group

Modify the group’s containerConfig in the database:

sqlite3 store/messages.db

UPDATE registered_groups 
SET container_config = json_set(
  COALESCE(container_config, '{}'),
  '$.timeout',
  3600000  -- 1 hour in milliseconds
)
WHERE name = 'Family Chat';

Check for infinite loops

Review the container log to see if the agent is stuck:

cat groups/{group}/logs/container-{timestamp}.log

Look for repeated patterns or lack of progress. If the agent is stuck, consider:

Simplifying the prompt
Adding constraints to the task
Reviewing recent changes to CLAUDE.md

Adjust idle timeout

The idle timeout (for graceful shutdown between messages) is currently equal to the hard timeout. This is a known issue. To fix:Edit src/config.ts and set:

export const IDLE_TIMEOUT = 5 * 60 * 1000; // 5 minutes
export const CONTAINER_TIMEOUT = 30 * 60 * 1000; // 30 minutes

Then rebuild:

npm run build && launchctl kickstart -k gui/$(id -u)/com.nanoclaw

Service stops after SSH logout (Linux)

Symptoms

NanoClaw works while SSH is connected but stops silently after disconnecting
systemctl --user status nanoclaw shows inactive after reconnecting
No error logs or crash — the service simply disappears
systemctl --user enable nanoclaw appears to succeed but the service still stops

Cause

By default, systemd sets Linger=no for users. When your last SSH session closes, systemd terminates all user-level processes — including the NanoClaw service. This is especially common on cloud VMs (EC2, GCP, Oracle Cloud).

Solution

The setup script automatically enables linger for non-root users. If you’re still experiencing this issue (e.g., you set up NanoClaw before this fix), run:

loginctl enable-linger

Verify it’s enabled:

loginctl show-user $USER | grep Linger
# Expected: Linger=yes

If loginctl enable-linger fails with a permission error, your system may require polkit authorization. Contact your system administrator or run the command with sudo.

Known issues

Cursor rollback on error

The message cursor (lastAgentTimestamp) is now rolled back on container errors, provided no output was already sent to the user. If the agent fails after sending partial output, the cursor is not rolled back to prevent duplicate messages. If you need to manually reset the cursor, update it in router_state:

-- The cursor is stored as a JSON object mapping JIDs to timestamps
SELECT value FROM router_state WHERE key = 'last_agent_timestamp';

IDLE_TIMEOUT and CONTAINER_TIMEOUT

The hard timeout is calculated as Math.max(configTimeout, IDLE_TIMEOUT + 30_000), ensuring a 30-second gap for graceful _close sentinel shutdown before the hard kill fires. With default settings (both 30 minutes), the hard timeout is 30 minutes 30 seconds. If containers are consistently hitting the hard timeout, consider lowering IDLE_TIMEOUT in src/config.ts to a shorter value (e.g., 5 minutes) so idle containers exit more promptly.

Kubernetes image garbage collection deletes nanoclaw-agent image

Symptoms: Container exited with code 125: pull access denied for nanoclaw-agent — the container image disappears after a few hours, even though you just built it. Cause: If your container runtime has Kubernetes enabled (Rancher Desktop enables it by default), the kubelet runs image garbage collection when disk usage exceeds 85%. NanoClaw containers are ephemeral (run and exit), so nanoclaw-agent:latest is never protected by a running container. The kubelet sees it as unused and deletes it — often overnight when no messages are being processed. Fix: Disable Kubernetes if you don’t need it:

# Rancher Desktop
rdctl set --kubernetes-enabled=false

# Then rebuild the container image
./container/build.sh

Diagnosis:

# Check k3s log for image GC activity (Rancher Desktop)
grep -i "nanoclaw" ~/Library/Logs/rancher-desktop/k3s.log
# Look for: "Removing image to free bytes" with the nanoclaw-agent image ID

# Check NanoClaw logs for image status
grep -E "image found|image NOT found|image missing" logs/nanoclaw.log

If you need Kubernetes enabled, set CONTAINER_IMAGE to an image stored in a registry that the kubelet won’t garbage-collect, or raise the kubelet’s GC thresholds.

Resume branches from stale tree position

Issue: When agent teams spawns subagent CLI processes, they write to the same session JSONL. On subsequent query() resumes, the CLI reads the JSONL but may pick a stale branch tip, causing responses to land on a branch the host never receives. Fix: Already fixed. The agent runner now passes resumeSessionAt with the last assistant message UUID to explicitly anchor each resume.

Container 401 errors

Symptoms

Container logs show 401 Unauthorized or authentication_error on API calls
Agent starts but fails immediately when trying to respond
Errors recur every few hours even after restarting

Cause

Short-lived OAuth tokens copied from the system keychain or ~/.claude/.credentials.json expire within hours. These tokens share the same sk-ant-oat01- prefix as long-lived tokens, so the mistake isn’t obvious. When they expire, every container launch fails with a 401.

Solution

OneCLI Agent Vault (v1.2.35+)
Credential Proxy (legacy)

Update the secret registered with OneCLI:

# Replace with a long-lived OAuth token or API key
onecli secrets create --name Anthropic --type anthropic --value YOUR_KEY --host-pattern api.anthropic.com

You can also use a long-lived OAuth token from claude setup-token:

onecli secrets create --name Claude --type oauth --value YOUR_TOKEN --host-pattern api.anthropic.com

Use one of these credential types in .env:

Long-lived OAuth token: Run claude setup-token and add the resulting CLAUDE_CODE_OAUTH_TOKEN to .env
API key: Get a key from console.anthropic.com and add ANTHROPIC_API_KEY to .env

After updating credentials, restart the service:

launchctl kickstart -k gui/$(id -u)/com.nanoclaw

Never copy tokens from ~/.claude/.credentials.json into .env. These are short-lived keychain tokens that expire within hours. Use claude setup-token to generate a long-lived token instead.

WhatsApp authentication issues

Symptoms

Logs show “QR” or “authentication required”
No messages are being received
Connection repeatedly drops

Diagnosis

# Check for QR code requests
grep 'QR\|authentication required\|qr' logs/nanoclaw.log | tail -5

# Check auth files exist
ls -la store/auth/

Solution

Re-authenticate with WhatsApp:

claude
/add-whatsapp

Scan the QR code with your phone, then restart the service:

launchctl kickstart -k gui/$(id -u)/com.nanoclaw

Container mount issues

Symptoms

Agent cannot access expected files or directories
Logs show “Mount validated” or “Mount REJECTED”
Permission errors in container logs

Diagnosis

# Check mount validation logs
grep -E 'Mount validated|Mount.*REJECTED|mount' logs/nanoclaw.log | tail -10

# Verify the mount allowlist is readable
cat ~/.config/nanoclaw/mount-allowlist.json

# Check group's container_config in DB
sqlite3 store/messages.db "SELECT name, container_config FROM registered_groups;"

Solutions

Add path to mount allowlist

Edit ~/.config/nanoclaw/mount-allowlist.json:

{
  "allowedRoots": [
    {
      "path": "/Users/you/Documents/project",
      "allowReadWrite": true,
      "description": "Project files"
    }
  ],
  "blockedPatterns": [".ssh", ".gnupg", ".aws", ".env", "credentials"],
  "nonMainReadOnly": true
}

Then restart the service. If the mount allowlist file did not exist at startup, NanoClaw will detect it once created without requiring a restart.

Reset mount allowlist to defaults

If you need to regenerate the default mount allowlist (for example, after a misconfiguration), re-run setup with the --force flag:

claude /setup --force

Without --force, setup skips the mount allowlist if it already exists to preserve your customizations.

Check for symlinks

The mount security system resolves symlinks before validation. If you’re mounting a symlink, ensure the resolved path is in the allowlist:

readlink -f /path/to/symlink

Verify file permissions

Containers run as the host user (or uid 1000 on some systems). Ensure the host user can read the mounted directories:

ls -la /path/to/mount

Test mount manually

Run a test container to verify mounts:

docker run -i --rm \
  -v /path/to/host:/workspace/test:ro \
  nanoclaw-agent:latest \
  ls -la /workspace/test

IPC issues

Symptoms

Messages sent via IPC don’t appear in the chat
Tasks don’t get scheduled
IPC files accumulate in data/ipc/{group}/

Diagnosis

# Check for failed IPC operations
ls -la data/ipc/errors/
cat data/ipc/errors/main-message-*.json

# Monitor IPC activity
grep 'IPC' logs/nanoclaw.log | tail -20

# Verify IPC directory permissions
ls -la data/ipc/main/
ls -la data/ipc/family-chat/

Solutions

Check IPC authorization

Non-main groups can only send messages to their own chat. Verify the chatJid in the IPC file matches the group’s registered JID:

sqlite3 store/messages.db "SELECT jid, name, folder FROM registered_groups;"

Verify JSON format

IPC files must be valid JSON. Check for syntax errors:

cat data/ipc/errors/{group}-message-*.json | jq .

Test IPC manually

Write a test message:

echo '{"type":"message","chatJid":"me","text":"test"}' > \
  data/ipc/main/messages/test.json

# Watch logs for processing
tail -f logs/nanoclaw.log | grep IPC

Session artifact auto-pruning

NanoClaw automatically cleans up stale session artifacts on startup and then every 24 hours. This prevents disk usage from growing unbounded as sessions accumulate. Retention policy:

Artifact	Retention	Notes
Session JSONLs and tool results	7 days	Active sessions (from DB) are never deleted
Debug logs	3 days	Active sessions skipped
Todo files	3 days	Active sessions skipped
Telemetry	7 days	Active sessions skipped
Group logs	7 days	—

The cleanup script reads active session IDs from SQLite and never deletes files belonging to active sessions. It runs 30 seconds after startup to avoid competing with initialization, then repeats on a 24-hour interval. You can run the cleanup manually with --dry-run to preview what would be deleted:

bash scripts/cleanup-sessions.sh --dry-run

Stale session auto-recovery

NanoClaw automatically detects and recovers from stale or corrupt sessions. When a container fails because the session transcript file (.jsonl) is missing — due to a crash mid-write, manual deletion, or disk-full condition — NanoClaw clears the broken session ID and lets the existing backoff mechanism in the group queue retry with a fresh session. Stale sessions are detected by matching the error output against known patterns (no conversation found, ENOENT on .jsonl files, or session not found). Only these specific signals trigger session clearing — transient errors (network, API) fall through to the normal retry path. You can identify stale session recovery in logs by looking for:

Stale session detected — clearing for next retry

No manual intervention is required.

Manual recovery

If auto-recovery doesn’t trigger (e.g., the error message doesn’t match the expected patterns), you can manually clear a stale session:

sqlite3 store/messages.db "DELETE FROM sessions WHERE group_folder = '{group-folder}';"

Then restart the service or wait for the next retry.

Session branching issues

Symptoms

Agent responses don’t appear in the conversation
Session transcript shows multiple branches
Concurrent CLI processes in session debug logs

Diagnosis

# Check for concurrent CLI processes
ls -la data/sessions/{group}/.claude/debug/

# Count unique SDK processes (each .txt file = one CLI subprocess)
# Multiple files = concurrent queries

# Check parentUuid branching in transcript
python3 -c "
import json, sys
lines = open('data/sessions/{group}/.claude/projects/-workspace-group/{session}.jsonl').read().strip().split('\n')
for i, line in enumerate(lines):
  try:
    d = json.loads(line)
    if d.get('type') == 'user' and d.get('message'):
      parent = d.get('parentUuid', 'ROOT')[:8]
      content = str(d['message'].get('content', ''))[:60]
      print(f'L{i+1} parent={parent} {content}')
  except: pass
"

Solution

This issue was fixed by passing resumeSessionAt with the last assistant message UUID. If you’re still experiencing it, ensure you’re running the latest version of the agent runner.

Service management

Restart the service

launchctl kickstart -k gui/$(id -u)/com.nanoclaw

View live logs

tail -f logs/nanoclaw.log

Stop the service

Stopping the service does NOT kill running containers. They will continue running as orphaned processes.

launchctl bootout gui/$(id -u)/com.nanoclaw

Start the service

launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.nanoclaw.plist

Rebuild after code changes

npm run build && launchctl kickstart -k gui/$(id -u)/com.nanoclaw

Getting help

If you’re still experiencing issues:

Ask Claude Code directly: Run claude and describe the problem. Claude can read logs, check database state, and diagnose issues.
Run the debug skill: claude → /debug for guided troubleshooting.
Check the Discord: Join the community for help from other users.
Review recent changes: If the issue started after a customization, review what changed:
```
git diff
git log --oneline -10
```

Security model - Authorization and isolation
IPC system - Inter-process communication
Container runtime - Container execution details

Get Started

Core Concepts

Features

Integrations

Advanced

Changelog

Documentation Index

​Quick status check

​Agent not responding

​Symptoms

​Diagnosis

​Solutions

​Container timeout

​Symptoms

​Diagnosis

​Solutions

​Service stops after SSH logout (Linux)

​Symptoms

​Cause

​Solution

​Known issues

​Cursor rollback on error

​IDLE_TIMEOUT and CONTAINER_TIMEOUT

​Kubernetes image garbage collection deletes nanoclaw-agent image

​Resume branches from stale tree position

​Container 401 errors

​Symptoms

​Cause

​Solution

​WhatsApp authentication issues

​Symptoms

​Diagnosis

​Solution

​Container mount issues

​Symptoms

​Diagnosis

​Solutions

​IPC issues

​Symptoms

​Diagnosis

​Solutions

​Session artifact auto-pruning

​Stale session auto-recovery

​Manual recovery

​Session branching issues

​Symptoms

​Diagnosis

​Solution

​Service management

​Restart the service

​View live logs

​Stop the service

​Start the service

​Rebuild after code changes

​Getting help

​Related pages

Quick status check

Agent not responding

Symptoms

Diagnosis

Solutions

Container timeout

Symptoms

Diagnosis

Solutions

Service stops after SSH logout (Linux)

Symptoms

Cause

Solution

Known issues

Cursor rollback on error

IDLE_TIMEOUT and CONTAINER_TIMEOUT

Kubernetes image garbage collection deletes nanoclaw-agent image

Resume branches from stale tree position

Container 401 errors

Symptoms

Cause

Solution

WhatsApp authentication issues

Symptoms

Diagnosis

Solution

Container mount issues

Symptoms

Diagnosis

Solutions

IPC issues

Symptoms

Diagnosis

Solutions

Session artifact auto-pruning

Stale session auto-recovery

Manual recovery

Session branching issues

Symptoms

Diagnosis

Solution

Service management

Restart the service

View live logs

Stop the service

Start the service

Rebuild after code changes

Getting help

Related pages