gemini-2.5-flash-lite

by google

Pricing

Input $0.04 / 1M tokens
Output $0.16 / 1M tokens

Gemini 2.5 Flash Release by Google


Overview

Gemini 2.5 Flash is a fast, cost-efficient model launched in May 2025. It supports 1M-token context, multimodal input, and controlled reasoning with strong performance across tasks.

  • Release Date: May 20, 2025
  • Multimodal Support: Text + Images + Audio + Video
  • Context Length: 1,048,576 tokens
  • Output Limit: 65,536 tokens

Features

  • Ultra-low latency and lightweight
  • Controllable reasoning via “thinking budget”
  • Multilingual fluency
  • Supports image, audio, and video input
  • Optimized for cost-sensitive applications

Benchmarks

Task Gemini 2.5 Flash GPT-4o Mini Claude Haiku
MMLU (reasoning) 80.9% 82.0% 73.8%
MGSM (math) 79.7% 87.0% 67.8%
HumanEval (coding) 74.1% 87.2% 64.9%
MMMU (multimodal) 56.8% 59.4% 54.9%

Pricing

Type Price per 1M tokens
Input $0.15
Output $0.60

Competitive pricing for long-context, multimodal applications.


Use Cases

  • Streaming chat interfaces and support agents
  • Summarizing long transcripts or PDFs
  • Real-time audio/video captioning
  • Code generation and assistance
  • Data-heavy RAG pipelines

Safety and Stability

  • Built on Gemini safety alignment framework
  • Instruction-following and hallucination resistance
  • Reasoning throttling via "thinking budget"
  • Stable API access via Vertex AI and Gemini API

Limitations

  • No web browsing (static knowledge)
  • Slightly lower accuracy than Gemini Pro on complex tasks
  • Currently requires Google Cloud or Gemini API access

License

Available for commercial and enterprise use via Vertex AI and Gemini API.