qwen-vlo

by qwen

Pricing

Input $0.00 / 1M tokens
Output $0.02 / 1M tokens

Qwen VLo – Multimodal Text-to-Image & Editing Power


Overview

Qwen VLo is Alibaba Cloud’s open-source text-to-image and image-editing model, designed for interactive visual generation with intelligent refinement. As of the latest release (Qwen 2.5 Max, July 2025), it supports:

  • Seamless progressive image generation – watch the image build step-by-step
  • Powerful image editing via natural language, including background and object replacement
  • Multilingual prompts and integration with Qwen Chat and API

Features

  • Progressive generation – see image evolution from sketch to detail
  • Interactive editing – modify objects, scenes, lighting, and composition via prompts
  • Multilingual input – fluent in English and Chinese text instructions
  • Instructable images – edit or expand generated images with follow-up commands
  • API-ready – accessible through Alibaba Cloud for app integration
  • Free to use – available via Qwen Chat with no login required

Release Timeline

VersionRelease DateHighlights
Qwen-VLSept 2023Multimodal understanding (VQA, captioning)
Qwen2-VLJan 2024Dynamic image resolution support
Qwen2.5-VLApr 2025Text-in-image, OCR, and document parsing
Qwen VLoJuly 2025Text-to-image & editing, progressive gen

Use Cases

  • Content Creation: posters, concept art, marketing visuals via prompt
  • Design Iteration: adjust or refine visuals interactively with natural language
  • Multilingual Visual Tools: generate or modify images in both English and Chinese
  • Workflow Integration: deploy within creative pipelines using Alibaba Cloud API

Strengths & Limitations

  • Strengths

    • Progressive generation with live visual updates
    • Rich image editing from natural-language instructions
    • Multilingual prompt understanding with open accessibility
  • Limitations

    • Less style variety compared to Midjourney or Ideogram
    • Advanced edits may require prompt tuning or retries