Skip to content

MCP Tools Reference

VoiceLayer exposes 2 primary tools and 9 backward-compat aliases (11 total). All tools include MCP ToolAnnotations.

voice_speak

Non-blocking text-to-speech. Speaks a message aloud or logs it silently. Auto-detects mode from message content if mode is omitted.

Property Value
Blocking No
Requires mic No
Session booking No
readOnlyHint false
destructiveHint false
idempotentHint true
openWorldHint false

Parameters:

Name Type Required Default Description
message string Yes Text to speak or log (must be non-empty after trimming)
mode string No auto announce, brief, consult, think, or auto (auto-detect from content)
voice string No jenny Profile name or raw edge-tts voice ID
rate string No (per-mode) Speech rate (e.g., -10%, +5%). Pattern: ^[+-]\d+%$
category string No insight For think mode: insight, question, red-flag, checklist-update
replay_index number No Replay cached message (0 = most recent). Ignores message.
enabled boolean No Toggle voice on/off instead of speaking
scope string No all Toggle scope: all, tts, or mic (only with enabled)

Mode auto-detection: insight:, note:, TODO: → think; ? or "about to" → consult; >280 chars → brief; default → announce.

Returns: [mode] Spoke: "message" or Noted (category): thought for think mode. Errors: Empty message, edge-tts not installed, audio player missing


voice_ask

Blocking voice Q&A. Auto-waits for any playing voice_speak audio to finish, then speaks a question aloud, records mic at device's native rate (auto-detected), resamples to 16kHz, transcribes via Silero VAD + whisper.cpp/Wispr Flow, returns text.

Property Value
Blocking Yes
Requires mic Yes
Session booking Yes (auto-books on first call)
Auto-waits Yes (waits for prior voice_speak playback)
readOnlyHint false
destructiveHint false
idempotentHint false
openWorldHint false

Parameters:

Name Type Required Default Description
message string Yes Question to speak aloud (must be non-empty)
timeout_seconds number No 30 Max wait time (clamped to 5-3600)
silence_mode string No thoughtful quick (0.5s), standard (1.5s), or thoughtful (2.5s)
press_to_talk boolean No false Push-to-talk mode — no VAD, stop via signal file only

Returns (success): The user's transcribed text (plain string) Returns (timeout): [converse] No response received within N seconds. Returns (busy): [converse] Line is busy — voice session owned by... (with isError: true)

Errors:

Error Cause
Line busy Another session has the mic
sox not installed rec command missing
Mic permission denied Terminal not authorized for mic
No STT backend Neither whisper.cpp nor Wispr available

Backward-Compat Aliases

All aliases share readOnlyHint: false, destructiveHint: false, openWorldHint: false.

Alias Maps To idempotent
qa_voice_announce voice_speak(mode='announce') true
qa_voice_brief voice_speak(mode='brief') true
qa_voice_consult voice_speak(mode='consult') true
qa_voice_say voice_speak(mode='announce') true
qa_voice_think voice_speak(mode='think') (uses thought param) false
qa_voice_replay voice_speak(replay_index=N) true
qa_voice_toggle voice_speak(enabled=bool) true
qa_voice_converse voice_ask false
qa_voice_ask voice_ask false

Error Handling

All tools return errors in MCP format:

{
  "content": [{ "type": "text", "text": "Error message here" }],
  "isError": true
}

Tools never throw exceptions — all errors are caught and returned as structured responses. Errors are also logged to stderr for debugging.

Prerequisites Summary

Tool Depends On
voice_speak (TTS modes) python3 + edge-tts, audio player
voice_ask All of the above + sox, STT backend (whisper.cpp or Wispr)
voice_speak (think mode) None (file system only)