Google introduced Gemini 3.1 Flash-Lite as a “built for intelligence at scale” model aimed at teams that care about responsiveness and unit economics. Google says it’s rolling out in preview via the Gemini API in AI Studio and for enterprises via Vertex AI.
The market is shifting toward volume-layer models—the ones that run constantly behind apps, agents, and internal tools. If performance is “good enough,” the deciding factor becomes cost and speed. Flash-Lite is Google’s bet that the next platform wins come from shipping intelligence everywhere, not only at the flagship tier.