Ollama v0.19 Switches to Apple's MLX: 93% Faster Decode on Apple Silicon | Tool Update
Ollama v0.19 ships a foundational change for Apple Silicon users: the inference backend on macOS now runs on Apple's MLX framework. The headline number is a 93…
Published on MyPrivateClaw
Apr 3, 2026, 7:47 PM UTC
Coverage date
Apr 3, 2026
Last updated
Apr 4, 2026, 5:23 AM UTC
News summary
Ollama v0.19 ships a foundational change for Apple Silicon users: the inference backend on macOS now runs on Apple's MLX framework rather than the previous Metal path. The headline number is a 93% improvement in decode throughput, but the more durable value is in the architectural changes that come with it — smarter caching and a memory scheduler that finally handles heterogeneous GPU configurations without manual intervention. $1 covers the multi GPU changes in detail. The decode performance gain is not marginal. On Gemma 3 12B, prompt evaluation speed jumped from 127 tokens/second to 1,380 tokens/second — a 10x improvement driven by how MLX handles layer loading and memory layout on the Unified Memory Architecture. For local inference workflows that involve long context reads at the start of each session — document summarization, code review agents, RAG pipelines with large retrieved…