Skip to content

Enrichment

BrainLayer enriches indexed chunks with structured metadata using a local LLM. Think of it as a librarian cataloging every conversation snippet.

Chunk Enrichment

Each chunk gets 10 metadata fields:

Field Description Example
summary 1-2 sentence gist "Debugging Telegram bot message drops under load"
tags Topic tags (comma-separated) "telegram, debugging, performance"
importance Relevance score 1-10 8 (architectural decision) vs 2 (directory listing)
intent What was happening debugging, designing, implementing, configuring, deciding, reviewing
primary_symbols Key code entities "TelegramBot, handleMessage, grammy"
resolved_query Question this answers (HyDE-style) "How does the Telegram bot handle rate limiting?"
epistemic_level How proven is this hypothesis, substantiated, validated
version_scope System state context "grammy 1.32, Node 22"
debt_impact Technical debt signal introduction, resolution, none
external_deps Libraries/APIs mentioned "grammy, Supabase, Railway"

Running Enrichment

# Basic (50 chunks at a time)
brainlayer enrich

# Larger batches
brainlayer enrich --batch-size=100

# Process up to 5000 chunks
brainlayer enrich --max=5000

# With parallel workers
brainlayer enrich --parallel=3

Source-Aware Thresholds

Not all chunks are worth enriching. BrainLayer automatically skips chunks that are too short:

Source Minimum Length Reason
Claude Code 50 characters Code context needs substance
WhatsApp / Telegram 15 characters Short messages can still be meaningful

Skipped chunks are tagged as skipped:too_short and excluded from enrichment stats.

Session Enrichment

Session-level analysis extracts structured insights from entire conversations:

brainlayer enrich-sessions
brainlayer enrich-sessions --project my-project --since 2026-01-01
brainlayer enrich-sessions --stats   # Show progress

Session enrichment extracts:

  • Summary — what the session was about
  • Decisions — architectural and implementation choices made
  • Corrections — mistakes caught and fixed
  • Learnings — new knowledge gained
  • Patterns — recurring approaches identified
  • Quality scores — code quality, communication quality

LLM Backends

Two local backends are supported:

Backend Best for Speed How to start
MLX Apple Silicon (M1/M2/M3) 21-87% faster python3 -m mlx_lm.server --model mlx-community/Qwen2.5-Coder-14B-Instruct-4bit --port 8080
Ollama Any platform ~1s/chunk (short), ~13s (long) ollama serve + ollama pull glm4

Backend is auto-detected: Apple Silicon defaults to MLX, everything else to Ollama. Override with:

BRAINLAYER_ENRICH_BACKEND=mlx brainlayer enrich
BRAINLAYER_ENRICH_BACKEND=ollama brainlayer enrich

Performance Tips

  • Set "think": false in Ollama API calls — GLM-4.7 defaults to thinking mode, adding 350+ tokens and 20s delay for no benefit
  • Use PYTHONUNBUFFERED=1 for log visibility in background processes
  • MLX parallel workers: each gets its own DB connection (thread-local)

Stall Detection

If a chunk takes too long (default: 5 minutes), it's automatically killed and skipped:

BRAINLAYER_STALL_TIMEOUT=300 brainlayer enrich  # 5 min default

Progress is logged every N chunks:

BRAINLAYER_HEARTBEAT_INTERVAL=25 brainlayer enrich  # Log every 25 chunks