Image vision

How it works
Prerequisites
Installation
Usage examples
Related pages

v1-only feature; not needed in v2. The /add-image-vision skill no longer ships in v2. Image handling is now native to the Claude provider — JPEG, PNG, GIF, and WebP attachments are passed directly to Claude as image content blocks (see docs/architecture.md §“Native content blocks” in the framework repo). No skill installation required. This page is pending deletion.

NanoClaw can understand images sent as message attachments using Claude’s multimodal capabilities. The agent sees the image content and can describe, analyze, or act on it.

Image vision is currently WhatsApp-only. This skill lives on the nanoclaw-whatsapp fork.

How it works

A WhatsApp image attachment arrives
The WhatsApp channel auto-downloads the image
The image is resized using sharp (to fit within Claude’s input limits)
The image is base64-encoded and passed to the agent as a multimodal content block
Claude sees the image alongside the text message and can reason about it

The agent doesn’t need special instructions — it sees the image natively as part of the conversation.

Prerequisites

WhatsApp channel installed (/add-whatsapp)
The sharp library (installed automatically by the skill)

Installation

# On your nanoclaw-whatsapp fork
git fetch whatsapp skill/image-vision
git merge whatsapp/skill/image-vision

Or via Claude Code:

/add-image-vision

After merging, rebuild:

npm run build

Usage examples

Send an image to a WhatsApp group where the agent is active, then ask:

@Andy what's in this image?
@Andy extract the text from this screenshot
@Andy describe this chart

Skills system — How skills work
WhatsApp integration — WhatsApp channel setup
Voice transcription — Another WhatsApp media skill

Last modified on May 2, 2026

Voice transcription PDF reader

⌘I

Get Started

Core Concepts

Features

Integrations

Advanced

Changelog

How it works

Prerequisites

Installation

Usage examples

Get Started

Core Concepts

Features

Integrations

Advanced

Changelog

Documentation Index

​How it works

​Prerequisites

​Installation

​Usage examples

​Related pages

How it works

Prerequisites

Installation

Usage examples

Related pages