MyPrivateClaw

Ollama v0.20.4: MLX M5 Performance Boost and Gemma 4 Flash Attention | Tool Update

Ollama released v0.20.4 with two targeted fixes: improved MLX inference performance on Apple M5 chips via NAX, and flash attention enabled for Gemma 4 models.…

Published on MyPrivateClaw

Apr 8, 2026, 8:54 PM UTC

Coverage date

Apr 7, 2026

Last updated

Apr 8, 2026, 8:54 PM UTC

News summary

Ollama released v0.20.4 with two targeted fixes: improved MLX inference performance on Apple M5 chips via NAX, and flash attention enabled for Gemma 4 models. Users on Apple Silicon M5 hardware will see faster inference without any configuration changes. The Gemma 4 flash attention fix reduces VRAM pressure on memory constrained setups.