Gemini 3.1 Flash-Lite
Google·Mar 2026proprietary
Gemini 3.1 Flash-Lite is Google's high-volume tier, designed for workloads where unit economics dominate: bulk summarization, media tagging, classification and high-QPS chat. Unusually for its price class it keeps full multimodal input (including video) and the 1M-token context window, which makes it a strong pick for cheap first-pass analysis of large media libraries.
Benchmark results
No verified benchmark results tracked yet for Gemini 3.1 Flash-Lite. This page is updated as official evaluations are published.
Where it shines
- Multimodal input at commodity pricing
- 1M context in the budget tier
- Very high rate limits for production scale
Alternatives to Gemini 3.1 Flash-Lite
Alibaba's trillion-parameter API flagship — frontier-adjacent quality with strong agentic tool use at mid-tier prices.
OpenAI's flagship reasoning model with a 1M-token context window, built for hard coding, science and long-horizon agentic work.
OpenAI's August 2025 unified flagship that merged the GPT and o-series reasoning lines into one model with selectable effort.
OpenAI's dedicated 2025 reasoning model that pioneered thinking-with-images and agentic tool use within chain-of-thought.
Frequently asked questions
- How much does the Gemini 3.1 Flash-Lite API cost?
- Gemini 3.1 Flash-Lite costs $0.25 per million input tokens and $1.5 per million output tokens. A workload of 10M input and 1.5M output tokens per month costs about $4.75.
- What is the context window of Gemini 3.1 Flash-Lite?
- Gemini 3.1 Flash-Lite supports a context window of 1,000,000 tokens (1M), with up to 66K output tokens per response.
- Is Gemini 3.1 Flash-Lite open source?
- No — Gemini 3.1 Flash-Lite is a proprietary model available through Google's API and partner platforms.
- What are the best alternatives to Gemini 3.1 Flash-Lite?
- The closest alternatives by overall capability are Qwen3-Max, GPT-5.5, GPT-5, OpenAI o3. See the comparison pages for detailed head-to-head breakdowns.