Voice transcription

v1-only feature. The /add-voice-transcription skill no longer ships in v2 — it is not present on trunk, the channels branch, or the providers branch. Voice transcription was a v1 fork-only capability tied to WhatsApp via the deprecated nanoclaw-whatsapp fork. This page is pending deletion; the instructions below do not apply to v2 installs.

NanoClaw can transcribe voice messages so the agent understands audio content. Two options are available: cloud-based (OpenAI Whisper API) and fully local (whisper.cpp).

Voice transcription is currently WhatsApp-only. Both skills live on the nanoclaw-whatsapp fork.

Cloud transcription (Whisper API)

The /add-voice-transcription skill uses OpenAI’s Whisper API for transcription. Cost: ~$0.006 per minute of audio.

Prerequisites

An OpenAI API key
WhatsApp channel installed (/add-whatsapp)

Installation

# On your nanoclaw-whatsapp fork
git fetch whatsapp skill/voice-transcription
git merge whatsapp/skill/voice-transcription

Or via Claude Code:

/add-voice-transcription

Configuration

Add your OpenAI API key to .env:

OPENAI_API_KEY=sk-...

How it works

A WhatsApp voice note arrives
The WhatsApp channel auto-downloads the audio file
The audio is sent to OpenAI’s Whisper API
The transcription is injected into the message content before the agent sees it

The agent receives the transcribed text as if the user had typed it — no special handling needed.

Local transcription (whisper.cpp)

The /use-local-whisper skill switches from the cloud API to on-device transcription using whisper.cpp. No API key needed, no cost, fully offline.

Prerequisites

Apple Silicon Mac (recommended for performance)
Homebrew packages:
```
brew install whisper-cpp ffmpeg
```
A GGML model file (downloaded during setup)

Installation

# On your nanoclaw-whatsapp fork (requires voice-transcription first)
git fetch whatsapp skill/use-local-whisper
git merge whatsapp/skill/use-local-whisper

Or via Claude Code:

/use-local-whisper

How it works

Same flow as cloud transcription, but audio is processed locally using the whisper.cpp CLI instead of the OpenAI API. The tradeoff is speed — local transcription is slower than the API, especially on longer voice notes, but it’s free and private.

Comparison

	Whisper API	Local whisper.cpp
Cost	~$0.006/min	Free
Speed	Fast (cloud)	Slower (on-device)
Privacy	Audio sent to OpenAI	Fully local
Requirements	`OPENAI_API_KEY`	Apple Silicon, whisper-cpp, ffmpeg
Offline	No	Yes

Skills system — How skills work
WhatsApp integration — WhatsApp channel setup

Get Started

Core Concepts

Features

Integrations

Advanced

Changelog

Voice transcription

Cloud transcription (Whisper API)

Prerequisites

Installation

Configuration

How it works

Local transcription (whisper.cpp)

Prerequisites

Installation

How it works

Comparison

Get Started

Core Concepts

Features

Integrations

Advanced

Changelog

Documentation Index

​Cloud transcription (Whisper API)

​Prerequisites

​Installation

​Configuration

​How it works

​Local transcription (whisper.cpp)

​Prerequisites

​Installation

​How it works

​Comparison

​Related pages

Cloud transcription (Whisper API)

Prerequisites

Installation

Configuration

How it works

Local transcription (whisper.cpp)

Prerequisites

Installation

How it works

Comparison

Related pages