z-image/turbo

Z-Image-Turbo is a 6 billion parameter text-to-image model that generates photorealistic images in sub-second time. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

TEXT-TO-IMAGENEWHOT
text-to-image
TURBO

Z-Image-Turbo is a 6 billion parameter text-to-image model that generates photorealistic images in sub-second time. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Z-Image-Turbo — 6B-parameter, ultra-fast text-to-image

Z-Image-Turbo is a 6B-parameter text-to-image model from Tongyi-MAI, engineered for production workloads where latency and throughput really matter. It uses only 8 sampling steps to render a full image, achieving sub-second latency on data-center GPUs and running comfortably on many 16 GB VRAM consumer cards.

Ultra-fast generation with production-ready quality

Where many diffusion models need dozens of steps, Z-Image-Turbo is aggressively optimised around an 8-step sampler. That keeps inference extremely fast while still delivering photorealistic images and reliable on-image text, making it a strong fit for interactive products, dashboards, and large-scale backends—not just offline batch jobs.

Why it looks so good?

  • Photorealistic output at speed Generates high-fidelity, realistic images that work for product photos, hero banners, and UI visuals without multi-second waits.
  • Bilingual prompts and text Understands prompts in English and Chinese, and can render multilingual text directly in the image—helpful for cross-market campaigns, posters, and screenshots.
  • Low-latency, low-step design Only 8 function evaluations per image deliver extremely low latency, ideal for chatbots, configuration tools, design assistants, and any “click → image” experience.
  • Friendly VRAM footprint Runs well in 16 GB VRAM environments, reducing hardware costs and making local or edge deployments more realistic.
  • Scales for bulk generation Its efficiency makes large jobs—catalogues, continuous feed images, or auto-generated thumbnails—practical without blowing up compute budgets.
  • Reproducible generations A controllable seed parameter lets you recreate a previous image or generate small, controlled variations for brand safety and experimentation.

How to use

  • prompt – natural-language description of the scene, style, and any on-image text (English or Chinese).
  • size (width / height) – choose the output resolution; supports square and rectangular images up to high resolutions (for example, 1536 × 1536).
  • seed – set to -1 for random results, or use a fixed integer to make outputs reproducible.

Pricing

Simple per-image billing:

  • Without prompt rewriting (prompt_extend=false): $0.015 per generated image
  • With prompt rewriting (prompt_extend=true): $0.03 per generated image

Try more models and see their difference!

  • Nano Banana Pro – Text-to-Image – Google’s Nano Banana Pro (Gemini 3.0 Pro Image family) delivers high-quality multi-image generation with extremely low cost per image, ideal for large-scale applications.
  • Seedream V4 – Text-to-Image – ByteDance’s high-resolution text-to-image model with rich detail and diverse styles, well suited for creative illustration and commercial visuals.
  • FLUX.2 [dev] – Text-to-Image – A lightweight FLUX.2-based base model hosted by AtlasCloud, optimised for efficient inference and LoRA-friendly training.

Paper

Tongyi-MAI/Z-Image-Turbo

Specifications in Depth

Overview:

Model Provider:TONGYIMAI
Model Type:text-to-image
Deployment:Inferencing API; Playground
Pricing:$0.01/pic

Key Specs:

Size Cap:up to width × height (user-configurable)
LoRA Support:No
Seed Options:N/A

Create Your Next Masterpiece

Z-Image Turbo - Lightning-Fast Text-to-Image Generation

NEW

6 Billion Parameter Model by Alibaba TONGYIMAI

Z-Image Turbo is the #1 ranked open-source text-to-image model, surpassing FLUX.2 [dev], HunyuanImage 3.0, and Qwen-Image on the Artificial Analysis Image Arena. Built by Alibaba's Tongyi-MAI team (a separate division from Qwen/Wan), this 6B parameter model achieves sub-second generation through advanced Decoupled-DMD distillation while maintaining photorealistic quality. With only 8 inference steps, it fits within 16GB VRAM and delivers professional results optimized for speed-critical production environments.

Ultra-Fast Generation
  • Only 8 inference steps (vs 20-50 for competitors)
  • Sub-second generation on H800 GPUs
  • 1.31-1.41× faster than Qwen Image per step
  • Fits in 16GB VRAM (RTX 3060/4090)
Photorealistic Quality
  • #1 ranked open-source model on AI Arena
  • Bilingual text rendering (English & Chinese)
  • Robust instruction adherence
  • Beats FLUX.1 [dev] and Qwen in all categories

Alibaba's Strategic Model Portfolio

Alibaba offers three specialized AI image generation systems, each optimized for different use cases

Speed Champion

Z-Image Turbo

Tongyi-MAI Team

Best For: Speed-critical production workloads
  • ⚡ Fastest: 8 steps, sub-second generation
  • 🏆 #1 ranked open-source model
  • 💰 Most cost-effective ($0.005/image)
  • 🎯 Optimized for rapid iteration
Quality King

Qwen-Image

Qwen Team

Best For: Maximum quality final renders
  • 🎨 Unmatched photorealism & skin textures
  • 💡 Superior lighting interactions
  • ⏱️ Slower (20s vs 5-10s for Z-Image)
  • 🎯 Best for high-end production work
Versatility Pro

Wan 2.5/2.6

Wan Team

Best For: Multimedia versatility
  • 🎬 Text-to-Video + Image-to-Video
  • 📹 Multi-resolution support (480P-720P)
  • 🔄 Audio-visual synchronization
  • 🎯 Cross-modal content generation

Key Insight: Z-Image Turbo is 1.31-1.41× faster than Qwen-Image per step, making it ideal for applications requiring rapid generation. While Qwen-Image offers slightly better photorealism for final renders, Z-Image Turbo provides the best balance of speed and quality for production environments.

Technical Highlights

Performance
S3-DiT Architecture

Adopts Single-Stream Diffusion Transformer (S3-DiT) architecture that unifies processing of various conditional inputs. This 6B parameter design achieves professional results without the computational overhead of larger models while maintaining state-of-the-art quality.

Speed
Decoupled-DMD Distillation

Advanced distillation algorithm with CFG Augmentation and Distribution Matching mechanisms enables 8-step inference (vs 20-50 for competitors). Achieves sub-second generation on H800 GPUs and runs smoothly on consumer RTX 3060/4090 with 16GB VRAM.

Quality
Leading Open-Source Performance

Ranked #1 open-source model on Artificial Analysis Image Arena, beating FLUX.2 [dev], HunyuanImage 3.0, and Qwen-Image. Excels at bilingual text rendering (English & Chinese), photorealistic generation, and robust instruction following. Released under Apache 2.0 license for commercial use.

Perfect For

🎨
Digital Art Creation
📸
Product Photography
📊
Marketing Materials
🎬
Concept Art
📱
Social Media Content
🖼️
Stock Photography
🎮
Game Assets
Creative Prototyping

Why Choose Z-Image Turbo

Instant Results
Sub-second generation with zero cold start latency. Get your images immediately without any waiting.
💰
Cost-Effective
Affordable pricing at $0.005 per image. Scale your creative projects without breaking the budget.
🔌
Ready-to-Use API
Simple REST API integration. Start generating images in minutes with our comprehensive documentation.

Technical Specifications

Model Architecture6 Billion Parameters
Inference Steps8 NFEs (Number of Function Evaluations)
Generation SpeedSub-second on H800, 5-10s on consumer GPUs
VRAM Requirement16GB (RTX 3060/4090 compatible)
ArchitectureSingle-Stream Diffusion Transformer (S3-DiT)
Distillation MethodDecoupled-DMD with CFG Augmentation
LicenseApache 2.0 (Commercial Use Allowed)
Ranking#1 Open-Source on Artificial Analysis Arena
Pricing$0.005 per Image

Start Creating with Z-Image Turbo

Experience lightning-fast, photorealistic image generation today. No setup required, just call our API and start creating.

No cold starts - instant generation
Affordable pricing - $0.005 per image
Professional quality results

Start From 300+ Models,

Explore all models