gemini-2.5-flash
by google
Pricing
Input
$0.12 / 1M tokens
Output
$1.00 / 1M tokens
Gemini 2.5 Flash Release by Google
Overview
Gemini 2.5 Flash is a fast, cost-efficient model launched in May 2025. It supports 1M-token context, multimodal input, and controlled reasoning with strong performance across tasks.
- Release Date: May 20, 2025
- Multimodal Support: Text + Images + Audio + Video
- Context Length: 1,048,576 tokens
- Output Limit: 65,536 tokens
Features
- Ultra-low latency and lightweight
- Controllable reasoning via “thinking budget”
- Multilingual fluency
- Supports image, audio, and video input
- Optimized for cost-sensitive applications
Benchmarks
Task | Gemini 2.5 Flash | GPT-4o Mini | Claude Haiku |
---|---|---|---|
MMLU (reasoning) | 80.9% | 82.0% | 73.8% |
MGSM (math) | 79.7% | 87.0% | 67.8% |
HumanEval (coding) | 74.1% | 87.2% | 64.9% |
MMMU (multimodal) | 56.8% | 59.4% | 54.9% |
Pricing
Type | Price per 1M tokens |
---|---|
Input | $0.15 |
Output | $0.60 |
Competitive pricing for long-context, multimodal applications.
Use Cases
- Streaming chat interfaces and support agents
- Summarizing long transcripts or PDFs
- Real-time audio/video captioning
- Code generation and assistance
- Data-heavy RAG pipelines
Safety and Stability
- Built on Gemini safety alignment framework
- Instruction-following and hallucination resistance
- Reasoning throttling via "thinking budget"
- Stable API access via Vertex AI and Gemini API
Limitations
- No web browsing (static knowledge)
- Slightly lower accuracy than Gemini Pro on complex tasks
- Currently requires Google Cloud or Gemini API access
License
Available for commercial and enterprise use via Vertex AI and Gemini API.