NanoClaw can understand images sent as message attachments using Claude’s multimodal capabilities. The agent sees the image content and can describe, analyze, or act on it.Documentation Index
Fetch the complete documentation index at: https://docs.nanoclaw.dev/llms.txt
Use this file to discover all available pages before exploring further.
Image vision is currently WhatsApp-only. This skill lives on the
nanoclaw-whatsapp fork.How it works
- A WhatsApp image attachment arrives
- The WhatsApp channel auto-downloads the image
- The image is resized using
sharp(to fit within Claude’s input limits) - The image is base64-encoded and passed to the agent as a multimodal content block
- Claude sees the image alongside the text message and can reason about it
Prerequisites
- WhatsApp channel installed (
/add-whatsapp) - The
sharplibrary (installed automatically by the skill)
Installation
Usage examples
Send an image to a WhatsApp group where the agent is active, then ask:Related pages
- Skills system — How skills work
- WhatsApp integration — WhatsApp channel setup
- Voice transcription — Another WhatsApp media skill