Configuration¶
VoiceLayer is configured entirely via environment variables. All settings have sensible defaults — zero config required for basic usage.
Environment Variables¶
STT (Speech-to-Text)¶
| Variable | Default | Description |
|---|---|---|
QA_VOICE_STT_BACKEND |
auto |
Backend selection: whisper, wispr, or auto |
QA_VOICE_WHISPER_MODEL |
auto-detected | Absolute path to a whisper.cpp GGML model file |
QA_VOICE_WISPR_KEY |
— | Wispr Flow API key (cloud fallback only) |
Auto-detection (auto mode) checks for whisper.cpp first, falls back to Wispr Flow if QA_VOICE_WISPR_KEY is set.
Model auto-detection scans ~/.cache/whisper/ for GGML files in this order:
ggml-large-v3-turbo.binggml-large-v3-turbo-q5_0.binggml-base.en.binggml-base.binggml-small.en.binggml-small.bin- Any other
ggml-*.binfile
TTS (Text-to-Speech)¶
| Variable | Default | Description |
|---|---|---|
QA_VOICE_TTS_VOICE |
en-US-JennyNeural |
Microsoft edge-tts voice ID |
QA_VOICE_TTS_RATE |
+0% |
Base speech rate (per-mode defaults layer on top) |
Available voices — run edge-tts --list-voices for the full list. Popular choices:
| Voice | Language | Style |
|---|---|---|
en-US-JennyNeural |
English (US) | Default, clear female |
en-US-GuyNeural |
English (US) | Male |
en-GB-SoniaNeural |
English (UK) | British female |
en-US-AriaNeural |
English (US) | Expressive female |
Recording¶
| Variable | Default | Description |
|---|---|---|
QA_VOICE_SILENCE_SECONDS |
2 |
Seconds of silence before auto-stop (converse mode uses 5s override) |
QA_VOICE_SILENCE_THRESHOLD |
500 |
RMS energy threshold for silence detection (0-32767) |
Output¶
| Variable | Default | Description |
|---|---|---|
QA_VOICE_THINK_FILE |
/tmp/voicelayer-thinking.md |
Path for the think mode markdown log |
Per-Mode Speech Rates¶
Each voice mode has a default rate that balances speed with clarity:
| Mode | Default Rate | Rationale |
|---|---|---|
| announce | +10% |
Quick status updates — snappy delivery |
| brief | -10% |
Long explanations — slower for digestion |
| consult | +5% |
Checkpoints — slightly fast, user may respond |
| converse | +0% |
Conversational — natural speed |
Rates are auto-adjusted for long text:
| Text Length | Adjustment |
|---|---|
| < 300 chars | No change |
| 300-599 chars | -5% |
| 600-999 chars | -10% |
| 1000+ chars | -15% |
You can override per-call by passing the rate parameter to any TTS tool.
MCP Server Configuration¶
Basic Setup¶
{
"mcpServers": {
"qa-voice": {
"command": "bun",
"args": ["run", "/path/to/voicelayer/src/mcp-server.ts"]
}
}
}
With Environment Overrides¶
{
"mcpServers": {
"qa-voice": {
"command": "bun",
"args": ["run", "/path/to/voicelayer/src/mcp-server.ts"],
"env": {
"QA_VOICE_TTS_VOICE": "en-GB-SoniaNeural",
"QA_VOICE_STT_BACKEND": "whisper",
"QA_VOICE_SILENCE_THRESHOLD": "300"
}
}
}
}
File Paths¶
VoiceLayer uses /tmp for all runtime files:
| File | Purpose |
|---|---|
/tmp/voicelayer-session.lock |
Session booking lockfile |
/tmp/voicelayer-stop |
User stop signal (touch to end) |
/tmp/voicelayer-tts-*.mp3 |
Temporary TTS audio (auto-cleaned) |
/tmp/voicelayer-recording-*.wav |
Temporary recording (auto-cleaned) |
/tmp/voicelayer-thinking.md |
Think mode log (persistent until cleared) |