Configuration¶

VoiceLayer is configured entirely via environment variables. All settings have sensible defaults — zero config required for basic usage.

Environment Variables¶

STT (Speech-to-Text)¶

Variable	Default	Description
`QA_VOICE_STT_BACKEND`	`auto`	Backend selection: `whisper`, `wispr`, or `auto`
`QA_VOICE_WHISPER_MODEL`	auto-detected	Absolute path to a whisper.cpp GGML model file
`QA_VOICE_WISPR_KEY`	—	Wispr Flow API key (cloud fallback only)

Auto-detection (auto mode) checks for whisper.cpp first, falls back to Wispr Flow if QA_VOICE_WISPR_KEY is set.

Model auto-detection scans ~/.cache/whisper/ for GGML files in this order:

ggml-large-v3-turbo.bin
ggml-large-v3-turbo-q5_0.bin
ggml-base.en.bin
ggml-base.bin
ggml-small.en.bin
ggml-small.bin
Any other ggml-*.bin file

TTS (Text-to-Speech)¶

Variable	Default	Description
`QA_VOICE_TTS_VOICE`	`en-US-JennyNeural`	Microsoft edge-tts voice ID
`QA_VOICE_TTS_RATE`	`+0%`	Base speech rate (per-mode defaults layer on top)

Available voices — run edge-tts --list-voices for the full list. Popular choices:

Voice	Language	Style
`en-US-JennyNeural`	English (US)	Default, clear female
`en-US-GuyNeural`	English (US)	Male
`en-GB-SoniaNeural`	English (UK)	British female
`en-US-AriaNeural`	English (US)	Expressive female

Recording¶

Recording uses Silero VAD (neural network) for speech detection. The device's native sample rate is auto-detected — no configuration needed for any microphone (built-in, AirPods, USB, etc.).

Silence detection is configured per-call via the silence_mode parameter on voice_ask:

Mode	Silence Duration	Use Case
`quick`	0.5s	Fast responses, short answers
`standard`	1.5s	Normal conversation
`thoughtful`	2.5s (default)	User pauses to think

Output¶

Variable	Default	Description
`QA_VOICE_THINK_FILE`	`/tmp/voicelayer-thinking.md`	Path for the think mode markdown log

Per-Mode Speech Rates¶

Each voice mode has a default rate that balances speed with clarity:

Mode	Default Rate	Rationale
announce	`+10%`	Quick status updates — snappy delivery
brief	`-10%`	Long explanations — slower for digestion
consult	`+5%`	Checkpoints — slightly fast, user may respond
converse	`+0%`	Conversational — natural speed

Rates are auto-adjusted for long text:

Text Length	Adjustment
< 300 chars	No change
300-599 chars	-5%
600-999 chars	-10%
1000+ chars	-15%

You can override per-call by passing the rate parameter to any TTS tool.

MCP Server Configuration¶

Basic Setup¶

{
  "mcpServers": {
    "voicelayer": {
      "command": "bunx",
      "args": ["voicelayer-mcp"]
    }
  }
}

With Environment Overrides¶

{
  "mcpServers": {
    "voicelayer": {
      "command": "bunx",
      "args": ["voicelayer-mcp"],
      "env": {
        "QA_VOICE_TTS_VOICE": "en-GB-SoniaNeural",
        "QA_VOICE_STT_BACKEND": "whisper"
      }
    }
  }
}

File Paths¶

VoiceLayer uses /tmp for all runtime files:

File	Purpose
`/tmp/voicelayer-session.lock`	Session booking lockfile
`/tmp/voicelayer-stop`	User stop signal (touch to end)
`/tmp/voicelayer-tts-*.mp3`	Temporary TTS audio (auto-cleaned)
`/tmp/voicelayer-recording-*.wav`	Temporary recording (auto-cleaned)
`/tmp/voicelayer-thinking.md`	Think mode log (persistent until cleared)