veo3_fast
by google
Pricing
Input
$0.00 / 1M tokens
Output
$1.12 / 1M tokens
Overview
Veoโฏ3 is a next-generation text-to-video model developed by Google DeepMind and released in May 2025. It produces cinematic-quality 8-second video clips from text or image prompts with native audio, including dialogue, sound effects, and music. Veoโฏ3 pushes the boundary of AI-generated video by combining photorealism, motion consistency, and rich sound design โ all within seconds.
Key Features
- ๐ฅ Up to 8 seconds of video per prompt
- ๐ง Audio support with synchronized dialogue, music, and effects
- ๐ง Multimodal inputs: accepts both text and image guidance
- ๐ Cinematic camera motion and shot control
- ๐ผ๏ธ High physical realism: consistent lighting, motion, and perspective
- ๐ Built-in watermarking with SynthID and visible branding
Use Cases
- ๐ฌ Film pre-visualization and storyboarding
- ๐ข Advertising and branded content creation
- ๐ Narrative storytelling with sound and motion
- ๐ Educational visualizations and explainer media
- โ๏ธ Prototyping cinematic ideas in tools like Google Flow
Limitations
- โฑ๏ธ Videos are capped at 8 seconds per generation
- ๐ Cannot generate real face likenesses or explicit content
- ๐ Region-limited access (Vertex AI and Gemini Ultra only)
Access and Pricing
- Gemini Advanced (Ultra): Full Veo 3 access with audio and Flow integration
- Gemini Pro: Access to Veo Fast variant (text-to-video without sound)
- Vertex AI: Enterprise API access via
veo-3.0-generate-preview
endpoint
๐งพ Google does not currently offer per-video pricing. Access is included in the Gemini subscription tiers or through enterprise billing on Vertex AI.
Performance Comparison
Model | Video Duration | Audio Support | Prompt Type | Access |
---|---|---|---|---|
Veo 3 | Up to 8 sec | Yes | Text + Image | Gemini Ultra / Vertex AI |
DALLยทE 3 (via GPT-4) | Static images only | No | Text | ChatGPT Pro |
Imagen 4 | Static images only | No | Text + Image | Gemini Ultra |