gemini-2.5-flash-lite

Overview

Gemini 2.5 Flash is a fast, cost-efficient model launched in May 2025. It supports 1M-token context, multimodal input, and controlled reasoning with strong performance across tasks.

Release Date: May 20, 2025
Multimodal Support: Text + Images + Audio + Video
Context Length: 1,048,576 tokens
Output Limit: 65,536 tokens

Features

Ultra-low latency and lightweight
Controllable reasoning via “thinking budget”
Multilingual fluency
Supports image, audio, and video input
Optimized for cost-sensitive applications

Benchmarks

Task	Gemini 2.5 Flash	GPT-4o Mini	Claude Haiku
MMLU (reasoning)	80.9%	82.0%	73.8%
MGSM (math)	79.7%	87.0%	67.8%
HumanEval (coding)	74.1%	87.2%	64.9%
MMMU (multimodal)	56.8%	59.4%	54.9%

Pricing

Type	Price per 1M tokens
Input	$0.15
Output	$0.60

Competitive pricing for long-context, multimodal applications.

Use Cases

Streaming chat interfaces and support agents
Summarizing long transcripts or PDFs
Real-time audio/video captioning
Code generation and assistance
Data-heavy RAG pipelines

Safety and Stability

Built on Gemini safety alignment framework
Instruction-following and hallucination resistance
Reasoning throttling via "thinking budget"
Stable API access via Vertex AI and Gemini API

Limitations

No web browsing (static knowledge)
Slightly lower accuracy than Gemini Pro on complex tasks
Currently requires Google Cloud or Gemini API access

License

Available for commercial and enterprise use via Vertex AI and Gemini API.