Image vision is currently WhatsApp-only. This skill lives on the
nanoclaw-whatsapp fork.How it works
- A WhatsApp image attachment arrives
- The WhatsApp channel auto-downloads the image
- The image is resized using
sharp(to fit within Claude’s input limits) - The image is base64-encoded and passed to the agent as a multimodal content block
- Claude sees the image alongside the text message and can reason about it
Prerequisites
- WhatsApp channel installed (
/add-whatsapp) - The
sharplibrary (installed automatically by the skill)
Installation
Usage examples
Send an image to a WhatsApp group where the agent is active, then ask:Related pages
- Skills system — How skills work
- WhatsApp integration — WhatsApp channel setup
- Voice transcription — Another WhatsApp media skill