Skip to content

Prerequisites

Everything VoiceLayer needs, with one-liner installs for each platform.

Required

Bun (JavaScript runtime)

VoiceLayer runs on Bun. Install it with:

curl -fsSL https://bun.sh/install | bash

Verify: bun --version should print 1.x.x or higher.

sox (microphone recording)

sox provides the rec command used to capture audio from your microphone.

brew install sox
sudo apt install sox
sudo dnf install sox

Verify: rec --version should print version info.

edge-tts (text-to-speech)

Microsoft's neural TTS engine. Free, no API key needed.

pip3 install edge-tts

Verify: python3 -m edge_tts --list-voices should print a list of voices.

Python 3 required

edge-tts is a Python package. Most systems have Python 3 pre-installed. If not: brew install python3 (macOS) or sudo apt install python3-pip (Linux).

Claude Code

VoiceLayer is an MCP server for Claude Code. Install Claude Code from Anthropic's docs.

whisper.cpp (local speech-to-text)

Local transcription — fast on Apple Silicon (~300ms for a 5-second clip), no cloud dependency.

brew install whisper-cpp

Build from source — see the whisper.cpp repo.

Then download a model:

mkdir -p ~/.cache/whisper
curl -L -o ~/.cache/whisper/ggml-large-v3-turbo.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin

Smaller models available

The large-v3-turbo model (~1.5 GB) gives the best accuracy. For faster downloads, use ggml-base.en.bin (~142 MB) — English only, slightly less accurate.

VoiceLayer auto-detects models in ~/.cache/whisper/. No config needed.

Audio player (Linux only)

macOS uses the built-in afplay. Linux needs one of these for MP3 playback:

sudo apt install mpv    # recommended
# or: sudo apt install mpg123
# or: sudo apt install ffmpeg  (provides ffplay)

Optional

Wispr Flow (cloud STT fallback)

If you don't install whisper.cpp, VoiceLayer can use Wispr Flow as a cloud-based speech-to-text backend. Requires an API key:

export QA_VOICE_WISPR_KEY="your-api-key"

This is optional — whisper.cpp is preferred for speed and privacy.

Microphone Access (macOS)

On macOS, your terminal app needs microphone permission:

System Settings > Privacy & Security > Microphone — enable your terminal (iTerm2, Terminal.app, Warp, etc.)

First recording may prompt

The first time VoiceLayer tries to record, macOS will show a permission dialog. Grant it, then try again.

Quick Check

Run these to verify everything is ready:

bun --version          # Should print 1.x.x+
rec --version          # Should print sox version info
python3 -m edge_tts -h # Should print help text
whisper-cli --help     # Should print help (optional, v1.8.3+ binary name)

If all commands work, head to the Quick Start to connect VoiceLayer to Claude Code.