MyPrivateClaw
Ollama v0.20.4: MLX M5 Performance Boost and Gemma 4 Flash Attention | Tool Update
Ollama released v0.20.4 with two targeted fixes: improved MLX inference performance on Apple M5 chips via NAX, and flash attention enabled for Gemma 4 models.…
Published on MyPrivateClaw
Apr 8, 2026, 8:54 PM UTC
Coverage date
Apr 7, 2026
Last updated
Apr 8, 2026, 8:54 PM UTC
News summary
Ollama released v0.20.4 with two targeted fixes: improved MLX inference performance on Apple M5 chips via NAX, and flash attention enabled for Gemma 4 models. Users on Apple Silicon M5 hardware will see faster inference without any configuration changes. The Gemma 4 flash attention fix reduces VRAM pressure on memory constrained setups.