MyPrivateClaw

Google Releases Gemma 4 — Multimodal On-Device Models from 2.3B to 31B Parameters | Model Release

Google DeepMind released Gemma 4, a family of four multimodal models (E2B, E4B, 26B MoE, 31B dense) under Apache 2.0 license. All models support image and text…

Published on MyPrivateClaw

Apr 5, 2026, 8:11 AM UTC

Coverage date

Apr 2, 2026

Last updated

Apr 5, 2026, 8:32 AM UTC

News summary

Google DeepMind released Gemma 4 on April 2, 2026, with four model sizes available immediately on Hugging Face under Apache 2.0 licenses. The family spans Gemma 4 E2B (2.3B effective parameters, 5.1B with embeddings), E4B (4.5B effective, 8B with embeddings), a 26B mixture of experts model with only 4B active parameters, and a 31B dense model — all available as base and instruction tuned variants with 128k 256k context windows. The architecture introduces two notable efficiency features. Per Layer Embeddings adds a small dedicated conditioning vector for each decoder layer, giving each layer its own channel to receive token specific information. Shared KV Cache causes the last N layers to reuse key value states from earlier layers, reducing memory and compute for long context inference — directly relevant for running 128k+ context sessions on consumer hardware. For local AI operators, t…