Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.nanoclaw.dev/llms.txt

Use this file to discover all available pages before exploring further.

v1-only feature; not needed in v2. The /add-image-vision skill no longer ships in v2. Image handling is now native to the Claude provider — JPEG, PNG, GIF, and WebP attachments are passed directly to Claude as image content blocks (see docs/architecture.md §“Native content blocks” in the framework repo). No skill installation required. This page is pending deletion.
NanoClaw can understand images sent as message attachments using Claude’s multimodal capabilities. The agent sees the image content and can describe, analyze, or act on it.
Image vision is currently WhatsApp-only. This skill lives on the nanoclaw-whatsapp fork.

How it works

  1. A WhatsApp image attachment arrives
  2. The WhatsApp channel auto-downloads the image
  3. The image is resized using sharp (to fit within Claude’s input limits)
  4. The image is base64-encoded and passed to the agent as a multimodal content block
  5. Claude sees the image alongside the text message and can reason about it
The agent doesn’t need special instructions — it sees the image natively as part of the conversation.

Prerequisites

  • WhatsApp channel installed (/add-whatsapp)
  • The sharp library (installed automatically by the skill)

Installation

# On your nanoclaw-whatsapp fork
git fetch whatsapp skill/image-vision
git merge whatsapp/skill/image-vision
Or via Claude Code:
/add-image-vision
After merging, rebuild:
npm run build

Usage examples

Send an image to a WhatsApp group where the agent is active, then ask:
@Andy what's in this image?
@Andy extract the text from this screenshot
@Andy describe this chart
Last modified on May 2, 2026