Skip to content

BrainLayer

Local Models

EtanHey/brainlayer

Local LLM Models Guide

Last updated: 2026-01-26 Hardware: MacBook Pro M1 Pro, 32GB RAM

Quick Reference: Models for Your Mac (32GB RAM)

Model	Size	Speed	Best For
`qwen3-coder`	19GB	Fast (MoE, 3.3B active)	Agentic coding, best overall
`qwen2.5-coder:32b`	20GB	~15-20 tok/s	High quality code gen
`qwen2.5-coder:14b`	9GB	~50 tok/s	Fast coding tasks
`devstral`	14GB	~30 tok/s	Codebase navigation, file ops

Model Details

qwen3-coder (RECOMMENDED)

ollama pull qwen3-coder

Architecture: MoE (Mixture of Experts)
Parameters: 30.5B total, only 3.3B activated
Size: ~19GB
RAM needed: 32GB (perfect for your Mac)
Why it's good: Smarter than dense models, runs fast due to MoE
Best for: Agentic coding, multi-step tasks, code analysis

qwen2.5-coder:32b

ollama pull qwen2.5-coder:32b

Architecture: Dense transformer
Parameters: 32.5B
Size: ~20GB
RAM needed: 24-32GB
Benchmark: Best open-source on EvalPlus, LiveCodeBench, BigCodeBench
Best for: High quality code generation, complex problems

qwen2.5-coder:14b

ollama pull qwen2.5-coder:14b

Parameters: 14B
Size: ~9GB
Speed: ~50+ tok/s on M1 Pro
Best for: Fast iterations, simpler tasks, when speed matters

devstral

ollama pull devstral

By: Mistral AI
Size: ~14GB
Specialty: File system operations, code navigation, large codebases
Best for: Codebase exploration, documentation audits, multi-file edits

Benchmark Comparison (SWE-Bench Verified)

Model	Score	Type
Claude 4 Sonnet	72.7%	Closed-source
Claude 4 Opus	72.5%	Closed-source
Qwen3-Coder	69.6%	Open-source
DeepSeek-V3	~68%	Open-source

Takeaway: Open-source is only ~3% behind Claude now.

Models That WON'T Fit (32GB RAM)

Model	Size	Issue
qwen3-coder:480b	163-368GB	Way too big
deepseek-v3	400GB+	Won't fit
Any 70B+ dense model	40GB+	Will swap, very slow

Usage Patterns

For Code Research/Documentation

ollama run qwen3-coder
# or
ollama run devstral

For Fast Code Generation

ollama run qwen2.5-coder:14b

For Highest Quality (Slower)

ollama run qwen2.5-coder:32b

Running with Context

Pipe a file

cat README.md | ollama run qwen3-coder "Analyze this README and suggest improvements"

Interactive session with system prompt

ollama run qwen3-coder --system "You are a senior software architect auditing a codebase."

Free Cloud Alternatives (When Local Won't Cut It)

Service	Models	Limit
Gemini CLI	Gemini 2.5 Pro	1000 req/day
Groq	Llama 3.3 70B	Very fast, free tier
Together.ai	All major models	$25 free credit
OpenRouter	Everything	Pay-per-use

Check What's Installed

ollama list

Check Available Space

df -h /
du -sh ~/.ollama/models/

Sources