VoiceLayer¶

Your AI agent can't hear you. VoiceLayer gives it ears and a voice.

Voice I/O for AI coding assistants. You type 40 words per minute. You speak 150. VoiceLayer adds voice input and output to Claude Code and any MCP client. Press F6, speak, ship.

  You ──🎤──> whisper.cpp ──> Claude Code ──> edge-tts ──🔊──> You
         STT (local)           MCP tools         TTS (free)

Local-first. Free. Open-source. No cloud APIs, no API keys, no data leaves your machine. Part of the Golems ecosystem.

Why VoiceLayer?¶

AI coding assistants are text-only. But some tasks are faster with voice:

QA testing — browse a page, speak what you see, let the agent take notes
Discovery calls — hands-free client interviews with automatic briefs
Code review — explain your reasoning while the agent captures it
Drilling sessions — interactive Q&A with voice responses

VoiceLayer bridges the gap between your terminal and your microphone.

How It Works¶

Claude Code  ─── MCP ───>  VoiceLayer
                            ├── Waits for any playing voice_speak audio
                            ├── edge-tts speaks question (speakers)
                            ├── sox records mic (native rate → resample to 16kHz)
                            ├── Silero VAD detects speech/silence
                            ├── whisper.cpp transcribes locally (~300ms)
                            └── Returns transcription to Claude

Claude calls voice_ask("How does the nav look on mobile?")
VoiceLayer waits for any prior voice_speak audio to finish (no overlap)
Speaks the question aloud via edge-tts
Mic recording starts at device's native sample rate (auto-detected)
Audio resampled to 16kHz in real-time, fed to Silero VAD for speech detection
Recording ends on user stop signal, VAD silence detection, or timeout
Audio transcribed by whisper.cpp (local) or Wispr Flow (cloud fallback)
Claude receives the transcribed text and continues

Voice Tools¶

Tool	What It Does	Blocking	readOnly	destructive	idempotent
voice_speak	Non-blocking TTS — auto-selects announce/brief/consult/think	No	false	false	true
voice_ask	Blocking voice Q&A — speak question, record + transcribe	Yes	false	false	false

All 11 tools (2 primary + 9 backward-compat aliases) include MCP ToolAnnotations. No VoiceLayer tools are destructive.

Mode-specific guidance: Announce, Brief, Consult, Converse, Think. Full reference: MCP Tools Reference.

Key Features¶

100% local STT — whisper.cpp on Apple Silicon, no cloud dependency
Session booking — lockfile mutex prevents mic conflicts between sessions
User-controlled stop — touch ~/.local/state/voicelayer/stop-{token}, or Silero VAD silence detection (quick 0.5s, standard 1.5s, thoughtful 2.5s)
Per-mode speech rates — announce is snappy (+10%), brief is slow (-10%)
Auto-slowdown — long text automatically gets slower speech rate
Cross-platform — macOS and Linux support

Quick Start¶

brew install sox whisper-cpp
pip3 install edge-tts

Then add to your .mcp.json:

{
  "mcpServers": {
    "voicelayer": {
      "command": "bunx",
      "args": ["voicelayer-mcp"]
    }
  }
}

See the full Quick Start guide for details, or read What is VoiceLayer? for a non-technical overview.

Platform Support¶

Platform	TTS	Audio Player	STT	Recording
macOS	edge-tts	afplay (built-in)	whisper.cpp	sox/rec
Linux	edge-tts	mpv, ffplay, or mpg123	whisper.cpp	sox/rec