MyPrivateClaw

Raspberry Pi 5 (8GB) — Ultra-low-power edge inference — expect 1–6 tokens/second on small models

The Raspberry Pi 5 with 8GB LPDDR4X is the minimum viable platform for running quantized LLMs at the edge. Using llama.cpp CPU only inference, it achieves…

Why it matters

The Raspberry Pi 5 with 8GB LPDDR4X is the minimum viable platform for running quantized LLMs at the edge. Using llama.cpp CPU only inference, it achieves 4–6 tokens/second on TinyLlama (1.1B) and 1–2 tokens/second on Llama 3 8B Q4 K M — slow but functional for offline, always on assistant use cases. Power draw is just 5–12W under load. The 8GB model is the only viable variant for LLM work.

Best for

Edge deployments where latency is acceptable and ultra low power consumption is the priority

Category

Why it matters

Best for