Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.nanoclaw.dev/llms.txt

Use this file to discover all available pages before exploring further.

This guide covers common issues you may encounter when running NanoClaw and how to resolve them.

Quick status check

Run these commands to get a quick overview of system health:
# 1. Is the service running?
launchctl list | grep nanoclaw
# Expected: PID  0  com.nanoclaw (PID = running, "-" = not running, non-zero exit = crashed)

# 2. Any running containers?
docker ps --format '{{.Names}} {{.Status}}' 2>/dev/null | grep nanoclaw

# 3. Any stopped/orphaned containers?
docker ps -a --format '{{.Names}} {{.Status}}' 2>/dev/null | grep nanoclaw

# 4. Recent errors in service log?
grep -E 'ERROR|WARN' logs/nanoclaw.log | tail -20

# 5. Are channels connected? (look for last connection event)
grep -E 'Connected|Connection closed|connection.*close' logs/nanoclaw.log | tail -5

# 6. Are groups loaded?
grep 'groupCount' logs/nanoclaw.log | tail -3

Agent not responding

Symptoms

Messages sent to the agent receive no response.

Diagnosis

1

Check if messages are being received

grep 'New messages' logs/nanoclaw.log | tail -10
If no messages appear, the issue is with channel connectivity.
2

Check if containers are being spawned

grep -E 'Processing messages|Spawning container' logs/nanoclaw.log | tail -10
If messages are received but no containers spawn, check trigger patterns.
3

Check for active containers

docker ps --filter name=nanoclaw-
If containers are stuck running, they may have hit an infinite loop.
4

Check container logs

docker logs nanoclaw-{group}-{timestamp}
Look for errors in the agent execution.

Solutions

Verify the trigger word in your message matches the configured trigger:
sqlite3 store/messages.db "SELECT name, trigger_pattern FROM registered_groups;"
The trigger is case-insensitive and must appear at the start of the message.
Ensure Docker is running:
docker info
If Docker is not running, start it:
  • macOS: Open Docker Desktop
  • Linux: sudo systemctl start docker
Check if the queue is blocked:
grep 'concurrency limit' logs/nanoclaw.log | tail -5
Stop stuck containers:
docker stop -t 1 $(docker ps -q --filter name=nanoclaw-)

Container timeout

Symptoms

  • Logs show Container timeout or timed out
  • Exit code 137 (SIGKILL) in container logs
  • Messages are lost after timeout

Diagnosis

# Check for recent timeouts
grep -E 'Container timeout|timed out' logs/nanoclaw.log | tail -10

# Check container log files
ls -lt groups/*/logs/container-*.log | head -10

# Read the most recent container log
cat groups/{group}/logs/container-{timestamp}.log

Solutions

Modify the group’s containerConfig in the database:
sqlite3 store/messages.db
UPDATE registered_groups 
SET container_config = json_set(
  COALESCE(container_config, '{}'),
  '$.timeout',
  3600000  -- 1 hour in milliseconds
)
WHERE name = 'Family Chat';
Review the container log to see if the agent is stuck:
cat groups/{group}/logs/container-{timestamp}.log
Look for repeated patterns or lack of progress. If the agent is stuck, consider:
  • Simplifying the prompt
  • Adding constraints to the task
  • Reviewing recent changes to CLAUDE.md
The idle timeout (for graceful shutdown between messages) is currently equal to the hard timeout. This is a known issue. To fix:Edit src/config.ts and set:
export const IDLE_TIMEOUT = 5 * 60 * 1000; // 5 minutes
export const CONTAINER_TIMEOUT = 30 * 60 * 1000; // 30 minutes
Then rebuild:
npm run build && launchctl kickstart -k gui/$(id -u)/com.nanoclaw

Service stops after SSH logout (Linux)

Symptoms

  • NanoClaw works while SSH is connected but stops silently after disconnecting
  • systemctl --user status nanoclaw shows inactive after reconnecting
  • No error logs or crash — the service simply disappears
  • systemctl --user enable nanoclaw appears to succeed but the service still stops

Cause

By default, systemd sets Linger=no for users. When your last SSH session closes, systemd terminates all user-level processes — including the NanoClaw service. This is especially common on cloud VMs (EC2, GCP, Oracle Cloud).

Solution

The setup script automatically enables linger for non-root users. If you’re still experiencing this issue (e.g., you set up NanoClaw before this fix), run:
loginctl enable-linger
Verify it’s enabled:
loginctl show-user $USER | grep Linger
# Expected: Linger=yes
If loginctl enable-linger fails with a permission error, your system may require polkit authorization. Contact your system administrator or run the command with sudo.

Known issues

Cursor rollback on error

The message cursor (lastAgentTimestamp) is now rolled back on container errors, provided no output was already sent to the user. If the agent fails after sending partial output, the cursor is not rolled back to prevent duplicate messages. If you need to manually reset the cursor, update it in router_state:
-- The cursor is stored as a JSON object mapping JIDs to timestamps
SELECT value FROM router_state WHERE key = 'last_agent_timestamp';

IDLE_TIMEOUT and CONTAINER_TIMEOUT

The hard timeout is calculated as Math.max(configTimeout, IDLE_TIMEOUT + 30_000), ensuring a 30-second gap for graceful _close sentinel shutdown before the hard kill fires. With default settings (both 30 minutes), the hard timeout is 30 minutes 30 seconds. If containers are consistently hitting the hard timeout, consider lowering IDLE_TIMEOUT in src/config.ts to a shorter value (e.g., 5 minutes) so idle containers exit more promptly.

Kubernetes image garbage collection deletes nanoclaw-agent image

Symptoms: Container exited with code 125: pull access denied for nanoclaw-agent — the container image disappears after a few hours, even though you just built it. Cause: If your container runtime has Kubernetes enabled (Rancher Desktop enables it by default), the kubelet runs image garbage collection when disk usage exceeds 85%. NanoClaw containers are ephemeral (run and exit), so nanoclaw-agent:latest is never protected by a running container. The kubelet sees it as unused and deletes it — often overnight when no messages are being processed. Fix: Disable Kubernetes if you don’t need it:
# Rancher Desktop
rdctl set --kubernetes-enabled=false

# Then rebuild the container image
./container/build.sh
Diagnosis:
# Check k3s log for image GC activity (Rancher Desktop)
grep -i "nanoclaw" ~/Library/Logs/rancher-desktop/k3s.log
# Look for: "Removing image to free bytes" with the nanoclaw-agent image ID

# Check NanoClaw logs for image status
grep -E "image found|image NOT found|image missing" logs/nanoclaw.log
If you need Kubernetes enabled, set CONTAINER_IMAGE to an image stored in a registry that the kubelet won’t garbage-collect, or raise the kubelet’s GC thresholds.

Resume branches from stale tree position

Issue: When agent teams spawns subagent CLI processes, they write to the same session JSONL. On subsequent query() resumes, the CLI reads the JSONL but may pick a stale branch tip, causing responses to land on a branch the host never receives. Fix: Already fixed. The agent runner now passes resumeSessionAt with the last assistant message UUID to explicitly anchor each resume.

Container 401 errors

Symptoms

  • Container logs show 401 Unauthorized or authentication_error on API calls
  • Agent starts but fails immediately when trying to respond
  • Errors recur every few hours even after restarting

Cause

Short-lived OAuth tokens copied from the system keychain or ~/.claude/.credentials.json expire within hours. These tokens share the same sk-ant-oat01- prefix as long-lived tokens, so the mistake isn’t obvious. When they expire, every container launch fails with a 401.

Solution

Update the secret registered with OneCLI:
# Replace with a long-lived OAuth token or API key
onecli secrets create --name Anthropic --type anthropic --value YOUR_KEY --host-pattern api.anthropic.com
You can also use a long-lived OAuth token from claude setup-token:
onecli secrets create --name Claude --type oauth --value YOUR_TOKEN --host-pattern api.anthropic.com
After updating credentials, restart the service:
launchctl kickstart -k gui/$(id -u)/com.nanoclaw
Never copy tokens from ~/.claude/.credentials.json into .env. These are short-lived keychain tokens that expire within hours. Use claude setup-token to generate a long-lived token instead.

WhatsApp authentication issues

Symptoms

  • Logs show “QR” or “authentication required”
  • No messages are being received
  • Connection repeatedly drops

Diagnosis

# Check for QR code requests
grep 'QR\|authentication required\|qr' logs/nanoclaw.log | tail -5

# Check auth files exist
ls -la store/auth/

Solution

Re-authenticate with WhatsApp:
claude
/add-whatsapp
Scan the QR code with your phone, then restart the service:
launchctl kickstart -k gui/$(id -u)/com.nanoclaw

Container mount issues

Symptoms

  • Agent cannot access expected files or directories
  • Logs show “Mount validated” or “Mount REJECTED”
  • Permission errors in container logs

Diagnosis

# Check mount validation logs
grep -E 'Mount validated|Mount.*REJECTED|mount' logs/nanoclaw.log | tail -10

# Verify the mount allowlist is readable
cat ~/.config/nanoclaw/mount-allowlist.json

# Check group's container_config in DB
sqlite3 store/messages.db "SELECT name, container_config FROM registered_groups;"

Solutions

Edit ~/.config/nanoclaw/mount-allowlist.json:
{
  "allowedRoots": [
    {
      "path": "/Users/you/Documents/project",
      "allowReadWrite": true,
      "description": "Project files"
    }
  ],
  "blockedPatterns": [".ssh", ".gnupg", ".aws", ".env", "credentials"],
  "nonMainReadOnly": true
}
Then restart the service. If the mount allowlist file did not exist at startup, NanoClaw will detect it once created without requiring a restart.
If you need to regenerate the default mount allowlist (for example, after a misconfiguration), re-run setup with the --force flag:
claude /setup --force
Without --force, setup skips the mount allowlist if it already exists to preserve your customizations.
Containers run as the host user (or uid 1000 on some systems). Ensure the host user can read the mounted directories:
ls -la /path/to/mount
Run a test container to verify mounts:
docker run -i --rm \
  -v /path/to/host:/workspace/test:ro \
  nanoclaw-agent:latest \
  ls -la /workspace/test

IPC issues

Symptoms

  • Messages sent via IPC don’t appear in the chat
  • Tasks don’t get scheduled
  • IPC files accumulate in data/ipc/{group}/

Diagnosis

# Check for failed IPC operations
ls -la data/ipc/errors/
cat data/ipc/errors/main-message-*.json

# Monitor IPC activity
grep 'IPC' logs/nanoclaw.log | tail -20

# Verify IPC directory permissions
ls -la data/ipc/main/
ls -la data/ipc/family-chat/

Solutions

Non-main groups can only send messages to their own chat. Verify the chatJid in the IPC file matches the group’s registered JID:
sqlite3 store/messages.db "SELECT jid, name, folder FROM registered_groups;"
IPC files must be valid JSON. Check for syntax errors:
cat data/ipc/errors/{group}-message-*.json | jq .
Write a test message:
echo '{"type":"message","chatJid":"me","text":"test"}' > \
  data/ipc/main/messages/test.json

# Watch logs for processing
tail -f logs/nanoclaw.log | grep IPC

Session artifact auto-pruning

NanoClaw automatically cleans up stale session artifacts on startup and then every 24 hours. This prevents disk usage from growing unbounded as sessions accumulate. Retention policy:
ArtifactRetentionNotes
Session JSONLs and tool results7 daysActive sessions (from DB) are never deleted
Debug logs3 daysActive sessions skipped
Todo files3 daysActive sessions skipped
Telemetry7 daysActive sessions skipped
Group logs7 days
The cleanup script reads active session IDs from SQLite and never deletes files belonging to active sessions. It runs 30 seconds after startup to avoid competing with initialization, then repeats on a 24-hour interval. You can run the cleanup manually with --dry-run to preview what would be deleted:
bash scripts/cleanup-sessions.sh --dry-run

Stale session auto-recovery

NanoClaw automatically detects and recovers from stale or corrupt sessions. When a container fails because the session transcript file (.jsonl) is missing — due to a crash mid-write, manual deletion, or disk-full condition — NanoClaw clears the broken session ID and lets the existing backoff mechanism in the group queue retry with a fresh session. Stale sessions are detected by matching the error output against known patterns (no conversation found, ENOENT on .jsonl files, or session not found). Only these specific signals trigger session clearing — transient errors (network, API) fall through to the normal retry path. You can identify stale session recovery in logs by looking for:
Stale session detected — clearing for next retry
No manual intervention is required.

Manual recovery

If auto-recovery doesn’t trigger (e.g., the error message doesn’t match the expected patterns), you can manually clear a stale session:
sqlite3 store/messages.db "DELETE FROM sessions WHERE group_folder = '{group-folder}';"
Then restart the service or wait for the next retry.

Session branching issues

Symptoms

  • Agent responses don’t appear in the conversation
  • Session transcript shows multiple branches
  • Concurrent CLI processes in session debug logs

Diagnosis

# Check for concurrent CLI processes
ls -la data/sessions/{group}/.claude/debug/

# Count unique SDK processes (each .txt file = one CLI subprocess)
# Multiple files = concurrent queries

# Check parentUuid branching in transcript
python3 -c "
import json, sys
lines = open('data/sessions/{group}/.claude/projects/-workspace-group/{session}.jsonl').read().strip().split('\n')
for i, line in enumerate(lines):
  try:
    d = json.loads(line)
    if d.get('type') == 'user' and d.get('message'):
      parent = d.get('parentUuid', 'ROOT')[:8]
      content = str(d['message'].get('content', ''))[:60]
      print(f'L{i+1} parent={parent} {content}')
  except: pass
"

Solution

This issue was fixed by passing resumeSessionAt with the last assistant message UUID. If you’re still experiencing it, ensure you’re running the latest version of the agent runner.

Service management

Restart the service

launchctl kickstart -k gui/$(id -u)/com.nanoclaw

View live logs

tail -f logs/nanoclaw.log

Stop the service

Stopping the service does NOT kill running containers. They will continue running as orphaned processes.
launchctl bootout gui/$(id -u)/com.nanoclaw

Start the service

launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.nanoclaw.plist

Rebuild after code changes

npm run build && launchctl kickstart -k gui/$(id -u)/com.nanoclaw

Getting help

If you’re still experiencing issues:
  1. Ask Claude Code directly: Run claude and describe the problem. Claude can read logs, check database state, and diagnose issues.
  2. Run the debug skill: claude/debug for guided troubleshooting.
  3. Check the Discord: Join the community for help from other users.
  4. Review recent changes: If the issue started after a customization, review what changed:
    git diff
    git log --oneline -10
    
Last modified on April 7, 2026