MyPrivateClaw

Ollama v0.19 Switches to Apple's MLX: 93% Faster Decode on Apple Silicon | Tool Update

Ollama v0.19 ships a foundational change for Apple Silicon users: the inference backend on macOS now runs on Apple's MLX framework. The headline number is a 93…

Published on MyPrivateClaw

Apr 3, 2026, 7:47 PM UTC

Coverage date

Apr 3, 2026

Last updated

Apr 4, 2026, 5:23 AM UTC

News summary

Ollama v0.19 ships a foundational change for Apple Silicon users: the inference backend on macOS now runs on Apple's MLX framework rather than the previous Metal path. The headline number is a 93% improvement in decode throughput, but the more durable value is in the architectural changes that come with it — smarter caching and a memory scheduler that finally handles heterogeneous GPU configurations without manual intervention. $1 covers the multi GPU changes in detail. The decode performance gain is not marginal. On Gemma 3 12B, prompt evaluation speed jumped from 127 tokens/second to 1,380 tokens/second — a 10x improvement driven by how MLX handles layer loading and memory layout on the Unified Memory Architecture. For local inference workflows that involve long context reads at the start of each session — document summarization, code review agents, RAG pipelines with large retrieved…