Gemini 2.5 Flash-Lite
Google · Gemini 2.5
Budget-oriented Gemini tier for large-scale assistant and automation workloads.
Overview
Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on February 15, 2026.
Gemini 2.5 Flash-Lite targets high-throughput workloads where cost control and response speed are primary constraints. Google positions it as the fastest Flash model optimized for cost efficiency and high throughput.
Capabilities
The model is practical for classification, extraction, concise summarization, translation, and routine assistant tasks. It can handle many day-to-day workflows when prompts are structured and outputs are validated.
Technical Details
Google’s current model docs list Gemini 2.5 Flash-Lite with a 1,048,576 token input window and a 65,536 token output limit. It supports the same broad input modalities and many of the same agent-oriented capabilities as Flash, but with a lower quality ceiling on difficult tasks.
Pricing & Access
Current Gemini API pricing lists Gemini 2.5 Flash-Lite at 0.40 per 1M output tokens, with audio input priced higher. Access is available through Google AI Studio and Vertex AI where the stable Flash-Lite SKU is enabled.
Best Use Cases
Best for ticket triage, data normalization, lightweight support automation, and high-volume internal tooling where responsiveness and budget matter.
Comparisons
Compared with Gemini 2.5 Flash, Flash-Lite is more cost-focused with a lower quality ceiling on difficult tasks. Compared with GPT-5 nano, both target high-volume automation with different ecosystem tradeoffs. Compared with Claude Haiku 4.5, choice depends on latency profile, output style, and integration requirements.