VoiceLayer¶

Voice I/O layer for AI coding assistants — local TTS, STT, session booking.

VoiceLayer adds voice input and output to Claude Code sessions via the Model Context Protocol (MCP). Speak questions aloud, record voice responses, and transcribe locally with whisper.cpp — all inside your terminal.

Why VoiceLayer?¶

AI coding assistants are text-only. But some tasks are faster with voice:

QA testing — browse a page, speak what you see, let the agent take notes
Discovery calls — hands-free client interviews with automatic briefs
Code review — explain your reasoning while the agent captures it
Drilling sessions — interactive Q&A with voice responses

VoiceLayer bridges the gap between your terminal and your microphone.

How It Works¶

Claude Code  ─── MCP ───>  VoiceLayer
                            ├── edge-tts speaks question (speakers)
                            ├── sox records mic (16kHz mono PCM)
                            ├── whisper.cpp transcribes locally (~300ms)
                            └── Returns transcription to Claude

Claude calls qa_voice_converse("How does the nav look on mobile?")
VoiceLayer speaks the question aloud via edge-tts
Mic recording starts — user speaks their response
Recording ends when user touches /tmp/voicelayer-stop or after 5s silence
Audio transcribed by whisper.cpp (local) or Wispr Flow (cloud fallback)
Claude receives the transcribed text and continues

5 Voice Modes¶

Mode	Tool	What It Does	Blocking
Announce	`qa_voice_announce`	Fire-and-forget TTS (status updates)	No
Brief	`qa_voice_brief`	One-way explanation (reading back decisions)	No
Consult	`qa_voice_consult`	Speak checkpoint, user may respond	No
Converse	`qa_voice_converse`	Full voice Q&A — speak + record + transcribe	Yes
Think	`qa_voice_think`	Silent notes to markdown log	No

Key Features¶

100% local STT — whisper.cpp on Apple Silicon, no cloud dependency
Session booking — lockfile mutex prevents mic conflicts between sessions
User-controlled stop — touch /tmp/voicelayer-stop ends recording or playback
Per-mode speech rates — announce is snappy (+10%), brief is slow (-10%)
Auto-slowdown — long text automatically gets slower speech rate
Cross-platform — macOS and Linux support

Quick Start¶

brew install sox whisper-cpp
pip3 install edge-tts
git clone https://github.com/EtanHey/voicelayer.git
cd voicelayer && bun install

Then add to your .mcp.json:

{
  "mcpServers": {
    "qa-voice": {
      "command": "bun",
      "args": ["run", "/path/to/voicelayer/src/mcp-server.ts"]
    }
  }
}

See the full Quick Start guide for details.

Platform Support¶

Platform	TTS	Audio Player	STT	Recording
macOS	edge-tts	afplay (built-in)	whisper.cpp	sox/rec
Linux	edge-tts	mpv, ffplay, or mpg123	whisper.cpp	sox/rec