Prerequisites¶

Everything VoiceLayer needs, with one-liner installs for each platform.

Required¶

Bun (JavaScript runtime)¶

VoiceLayer runs on Bun. Install it with:

curl -fsSL https://bun.sh/install | bash

Verify: bun --version should print 1.x.x or higher.

sox (microphone recording)¶

sox provides the rec command used to capture audio from your microphone.

macOSUbuntu/DebianFedora/RHEL

brew install sox

sudo apt install sox

sudo dnf install sox

Verify: rec --version should print version info.

edge-tts (text-to-speech)¶

Microsoft's neural TTS engine. Free, no API key needed.

pip3 install edge-tts

Verify: python3 -m edge_tts --list-voices should print a list of voices.

Python 3 required

edge-tts is a Python package. Most systems have Python 3 pre-installed. If not: brew install python3 (macOS) or sudo apt install python3-pip (Linux).

Claude Code¶

VoiceLayer is an MCP server for Claude Code. Install Claude Code from Anthropic's docs.

Recommended¶

whisper.cpp (local speech-to-text)¶

Local transcription — fast on Apple Silicon (~300ms for a 5-second clip), no cloud dependency.

macOSLinux

brew install whisper-cpp

Build from source — see the whisper.cpp repo.

Then download a model:

mkdir -p ~/.cache/whisper
curl -L -o ~/.cache/whisper/ggml-large-v3-turbo.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin

Smaller models available

The large-v3-turbo model (~1.5 GB) gives the best accuracy. For faster downloads, use ggml-base.en.bin (~142 MB) — English only, slightly less accurate.

VoiceLayer auto-detects models in ~/.cache/whisper/. No config needed.

Audio player (Linux only)¶

macOS uses the built-in afplay. Linux needs one of these for MP3 playback:

sudo apt install mpv    # recommended
# or: sudo apt install mpg123
# or: sudo apt install ffmpeg  (provides ffplay)

Optional¶

Wispr Flow (cloud STT fallback)¶

If you don't install whisper.cpp, VoiceLayer can use Wispr Flow as a cloud-based speech-to-text backend. Requires an API key:

export QA_VOICE_WISPR_KEY="your-api-key"

This is optional — whisper.cpp is preferred for speed and privacy.

Microphone Access (macOS)¶

On macOS, your terminal app needs microphone permission:

System Settings > Privacy & Security > Microphone — enable your terminal (iTerm2, Terminal.app, Warp, etc.)

First recording may prompt

The first time VoiceLayer tries to record, macOS will show a permission dialog. Grant it, then try again.

Quick Check¶

Run these to verify everything is ready:

bun --version          # Should print 1.x.x+
rec --version          # Should print sox version info
python3 -m edge_tts -h # Should print help text
whisper-cli --help     # Should print help (optional, v1.8.3+ binary name)

If all commands work, head to the Quick Start to connect VoiceLayer to Claude Code.