Google's Gemma 3 Now Runs on Consumer GPUs with Quantization-Aware Training
19-Apr-2025
Sundar Pichai announced that Google's Gemma 3 — previously optimized for high-end H100 GPUs — can now run efficiently on a single *desktop* GPU. This is made possible by a new Quantization-Aware Training (QAT) technique that reduces memory usage significantly while preserving high model quality. This breakthrough is poised to make Gemma 3 more accessible for a wider range of developers, including those without access to expensive cloud infrastructure.
Gemma 3's transition to consumer hardware marks a notable moment in democratizing advanced AI capabilities. By enabling smaller labs and individual developers to experiment with cutting-edge models on modest hardware, Google continues to lower the barrier to entry for AI innovation. More details are available on the official Google Developers Blog.