VoiceLayer¶
Voice I/O layer for AI coding assistants — local TTS, STT, session booking.
VoiceLayer adds voice input and output to Claude Code sessions via the Model Context Protocol (MCP). Speak questions aloud, record voice responses, and transcribe locally with whisper.cpp — all inside your terminal.
Why VoiceLayer?¶
AI coding assistants are text-only. But some tasks are faster with voice:
- QA testing — browse a page, speak what you see, let the agent take notes
- Discovery calls — hands-free client interviews with automatic briefs
- Code review — explain your reasoning while the agent captures it
- Drilling sessions — interactive Q&A with voice responses
VoiceLayer bridges the gap between your terminal and your microphone.
How It Works¶
Claude Code ─── MCP ───> VoiceLayer
├── edge-tts speaks question (speakers)
├── sox records mic (16kHz mono PCM)
├── whisper.cpp transcribes locally (~300ms)
└── Returns transcription to Claude
- Claude calls
qa_voice_converse("How does the nav look on mobile?") - VoiceLayer speaks the question aloud via edge-tts
- Mic recording starts — user speaks their response
- Recording ends when user touches
/tmp/voicelayer-stopor after 5s silence - Audio transcribed by whisper.cpp (local) or Wispr Flow (cloud fallback)
- Claude receives the transcribed text and continues
5 Voice Modes¶
| Mode | Tool | What It Does | Blocking |
|---|---|---|---|
| Announce | qa_voice_announce |
Fire-and-forget TTS (status updates) | No |
| Brief | qa_voice_brief |
One-way explanation (reading back decisions) | No |
| Consult | qa_voice_consult |
Speak checkpoint, user may respond | No |
| Converse | qa_voice_converse |
Full voice Q&A — speak + record + transcribe | Yes |
| Think | qa_voice_think |
Silent notes to markdown log | No |
Key Features¶
- 100% local STT — whisper.cpp on Apple Silicon, no cloud dependency
- Session booking — lockfile mutex prevents mic conflicts between sessions
- User-controlled stop —
touch /tmp/voicelayer-stopends recording or playback - Per-mode speech rates — announce is snappy (+10%), brief is slow (-10%)
- Auto-slowdown — long text automatically gets slower speech rate
- Cross-platform — macOS and Linux support
Quick Start¶
brew install sox whisper-cpp
pip3 install edge-tts
git clone https://github.com/EtanHey/voicelayer.git
cd voicelayer && bun install
Then add to your .mcp.json:
{
"mcpServers": {
"qa-voice": {
"command": "bun",
"args": ["run", "/path/to/voicelayer/src/mcp-server.ts"]
}
}
}
See the full Quick Start guide for details.
Platform Support¶
| Platform | TTS | Audio Player | STT | Recording |
|---|---|---|---|---|
| macOS | edge-tts | afplay (built-in) | whisper.cpp | sox/rec |
| Linux | edge-tts | mpv, ffplay, or mpg123 | whisper.cpp | sox/rec |