MCP Tools Reference¶

VoiceLayer exposes 2 primary tools and 9 backward-compat aliases (11 total). All tools include MCP ToolAnnotations.

voice_speak¶

Non-blocking text-to-speech. Speaks a message aloud or logs it silently. Auto-detects mode from message content if mode is omitted.

Property	Value
Blocking	No
Requires mic	No
Session booking	No
readOnlyHint	`false`
destructiveHint	`false`
idempotentHint	`true`
openWorldHint	`false`

Parameters:

Name	Type	Required	Default	Description
`message`	`string`	Yes	—	Text to speak or log (must be non-empty after trimming)
`mode`	`string`	No	`auto`	`announce`, `brief`, `consult`, `think`, or `auto` (auto-detect from content)
`voice`	`string`	No	`jenny`	Profile name or raw edge-tts voice ID
`rate`	`string`	No	(per-mode)	Speech rate (e.g., `-10%`, `+5%`). Pattern: `^[+-]\d+%$`
`category`	`string`	No	`insight`	For think mode: `insight`, `question`, `red-flag`, `checklist-update`
`replay_index`	`number`	No	—	Replay cached message (0 = most recent). Ignores message.
`enabled`	`boolean`	No	—	Toggle voice on/off instead of speaking
`scope`	`string`	No	`all`	Toggle scope: `all`, `tts`, or `mic` (only with `enabled`)

Mode auto-detection: insight:, note:, TODO: → think; ? or "about to" → consult; >280 chars → brief; default → announce.

Returns: [mode] Spoke: "message" or Noted (category): thought for think mode. Errors: Empty message, edge-tts not installed, audio player missing

voice_ask¶

Blocking voice Q&A. Auto-waits for any playing voice_speak audio to finish, then speaks a question aloud, records mic at device's native rate (auto-detected), resamples to 16kHz, transcribes via Silero VAD + whisper.cpp/Wispr Flow, returns text.

Property	Value
Blocking	Yes
Requires mic	Yes
Session booking	Yes (auto-books on first call)
Auto-waits	Yes (waits for prior `voice_speak` playback)
readOnlyHint	`false`
destructiveHint	`false`
idempotentHint	`false`
openWorldHint	`false`

Parameters:

Name	Type	Required	Default	Description
`message`	`string`	Yes	—	Question to speak aloud (must be non-empty)
`timeout_seconds`	`number`	No	`30`	Max wait time (clamped to 5-3600)
`silence_mode`	`string`	No	`thoughtful`	`quick` (0.5s), `standard` (1.5s), or `thoughtful` (2.5s)
`press_to_talk`	`boolean`	No	`false`	Push-to-talk mode — no VAD, stop via signal file only

Returns (success): The user's transcribed text (plain string) Returns (timeout): [converse] No response received within N seconds. Returns (busy): [converse] Line is busy — voice session owned by... (with isError: true)

Errors:

Error	Cause
Line busy	Another session has the mic
sox not installed	`rec` command missing
Mic permission denied	Terminal not authorized for mic
No STT backend	Neither whisper.cpp nor Wispr available

Backward-Compat Aliases¶

All aliases share readOnlyHint: false, destructiveHint: false, openWorldHint: false.

Alias	Maps To	idempotent
`qa_voice_announce`	`voice_speak(mode='announce')`	true
`qa_voice_brief`	`voice_speak(mode='brief')`	true
`qa_voice_consult`	`voice_speak(mode='consult')`	true
`qa_voice_say`	`voice_speak(mode='announce')`	true
`qa_voice_think`	`voice_speak(mode='think')` (uses `thought` param)	false
`qa_voice_replay`	`voice_speak(replay_index=N)`	true
`qa_voice_toggle`	`voice_speak(enabled=bool)`	true
`qa_voice_converse`	`voice_ask`	false
`qa_voice_ask`	`voice_ask`	false

Error Handling¶

All tools return errors in MCP format:

{
  "content": [{ "type": "text", "text": "Error message here" }],
  "isError": true
}

Tools never throw exceptions — all errors are caught and returned as structured responses. Errors are also logged to stderr for debugging.

Prerequisites Summary¶

Tool	Depends On
voice_speak (TTS modes)	`python3` + `edge-tts`, audio player
voice_ask	All of the above + `sox`, STT backend (whisper.cpp or Wispr)
voice_speak (think mode)	None (file system only)