qwen-vlo

by qwen

Pricing

Input $0.00 / 1M tokens

Output $0.02 / 1M tokens

Qwen VLo – Multimodal Text-to-Image & Editing Power

Overview

Qwen VLo is Alibaba Cloud’s open-source text-to-image and image-editing model, designed for interactive visual generation with intelligent refinement. As of the latest release (Qwen 2.5 Max, July 2025), it supports:

Seamless progressive image generation – watch the image build step-by-step
Powerful image editing via natural language, including background and object replacement
Multilingual prompts and integration with Qwen Chat and API

Features

Progressive generation – see image evolution from sketch to detail
Interactive editing – modify objects, scenes, lighting, and composition via prompts
Multilingual input – fluent in English and Chinese text instructions
Instructable images – edit or expand generated images with follow-up commands
API-ready – accessible through Alibaba Cloud for app integration
Free to use – available via Qwen Chat with no login required

Release Timeline

Version	Release Date	Highlights
Qwen-VL	Sept 2023	Multimodal understanding (VQA, captioning)
Qwen2-VL	Jan 2024	Dynamic image resolution support
Qwen2.5-VL	Apr 2025	Text-in-image, OCR, and document parsing
Qwen VLo	July 2025	Text-to-image & editing, progressive gen

Use Cases

Content Creation: posters, concept art, marketing visuals via prompt
Design Iteration: adjust or refine visuals interactively with natural language
Multilingual Visual Tools: generate or modify images in both English and Chinese
Workflow Integration: deploy within creative pipelines using Alibaba Cloud API

Strengths & Limitations

Strengths
- Progressive generation with live visual updates
- Rich image editing from natural-language instructions
- Multilingual prompt understanding with open accessibility
Limitations
- Less style variety compared to Midjourney or Ideogram
- Advanced edits may require prompt tuning or retries