Voice transcription is currently WhatsApp-only. Both skills live on the
nanoclaw-whatsapp fork.Cloud transcription (Whisper API)
The/add-voice-transcription skill uses OpenAI’s Whisper API for transcription.
Cost: ~$0.006 per minute of audio.
Prerequisites
- An OpenAI API key
- WhatsApp channel installed (
/add-whatsapp)
Installation
Configuration
Add your OpenAI API key to.env:
How it works
- A WhatsApp voice note arrives
- The WhatsApp channel auto-downloads the audio file
- The audio is sent to OpenAI’s Whisper API
- The transcription is injected into the message content before the agent sees it
Local transcription (whisper.cpp)
The/use-local-whisper skill switches from the cloud API to on-device transcription using whisper.cpp. No API key needed, no cost, fully offline.
Prerequisites
- Apple Silicon Mac (recommended for performance)
- Homebrew packages:
- A GGML model file (downloaded during setup)
Installation
How it works
Same flow as cloud transcription, but audio is processed locally using the whisper.cpp CLI instead of the OpenAI API. The tradeoff is speed — local transcription is slower than the API, especially on longer voice notes, but it’s free and private.Comparison
| Whisper API | Local whisper.cpp | |
|---|---|---|
| Cost | ~$0.006/min | Free |
| Speed | Fast (cloud) | Slower (on-device) |
| Privacy | Audio sent to OpenAI | Fully local |
| Requirements | OPENAI_API_KEY | Apple Silicon, whisper-cpp, ffmpeg |
| Offline | No | Yes |
Related pages
- Skills system — How skills work
- WhatsApp integration — WhatsApp channel setup