Home
Explore
Microsof
MAI Image 2.5 Models
microsoft/mai-image-2.5-flash/text-to-image
MAI-Image-2.5-Flash Text-to-image
text-to-image

MAI-Image 2.5 Flash Text-to-Image API by MICROSOFT

microsoft/mai-image-2.5-flash/text-to-image
Text-to-image

Microsoft's fast, cost-optimized text-to-image generation model, creating high-quality images at lower cost using the same diffusion-based architecture as MAI-Image-2.5.

MAI-Image-2.5-Flash Text-to-Image

MAI-Image-2.5-Flash is Microsoft's fast, cost-optimized text-to-image generation model, designed to create high-quality, visually rich images from natural language prompts at significantly lower cost than the standard MAI-Image-2.5. It uses the same diffusion-based generative approach, enabling strong alignment between the input text and the generated output, while being optimized for speed and throughput. Released on June 2, 2026.

Key Capabilities

  • Photorealistic image synthesis — Generates realistic imagery with consistent visual structure, accurate lighting, depth, and texture, suitable for concept visualization and professional content creation.
  • High-fidelity portraits — Produces expressive, natural-looking portraits with accurate facial structure, lighting, and skin texture.
  • Accurate text rendering — Improved rendering of legible text within generated images, including labels, posters, packaging, and signage.
  • Visual reasoning — Reasons across objects, scene structure, lighting, scale, and spatial positioning to produce consistent outputs even from ambiguous or complex prompts.
  • Product, branding & commercial design — Well suited for product imagery, marketing visuals, brand assets, and commercial creative workflows.
  • Creative concept visualization — Translates abstract textual descriptions into visually coherent and imaginative outputs.

Flash vs. Standard

FeatureMAI-Image-2.5MAI-Image-2.5-Flash
Output image cost$0.05 / image$0.03 / image
SpeedStandardFaster
QualityMaximum fidelityHigh quality, optimized
Best forPremium productionHigh-volume, cost-sensitive

Flash is the recommended choice for high-volume workflows, rapid prototyping, and scenarios where speed and cost efficiency take priority over absolute maximum fidelity.

Pricing

Pricing is based on two components: the input text tokens in the prompt, and a fixed per-image output fee.

SKUDescriptionUnit Price
sku_input_1m_tokenPrice per 1M input (prompt) tokens$5.00
sku_output_imageFixed fee per generated image$0.03

Pricing Formula

cost = countTokens(prompt) / 1,000,000 × $5.00 + $0.03

For most prompts (a few hundred tokens), the token cost is negligible and the effective cost is approximately $0.03 per image — 40% lower than the standard MAI-Image-2.5.

Examples

Prompt LengthToken CountToken CostImage FeeTotal
Short (e.g., 50 tokens)50~$0.000250$0.03~$0.0303
Medium (e.g., 500 tokens)500~$0.002500$0.03~$0.0325
Long (e.g., 2,000 tokens)2,000~$0.010000$0.03~$0.0400

For detailed pricing configuration, see models/microsoft/mai/price/microsoft-mai-image-2.5-flash-text-to-image.json.

Best Use Cases

  • High-volume generation — Batch generation of marketing assets, product variants, or dataset images.
  • Rapid prototyping — Quickly iterate on visual concepts before committing to full-quality renders.
  • Real-time applications — Interactive generation in products where response time matters.
  • Social media content — Fast-turnaround visuals for campaigns and posts.
  • E-commerce at scale — Generate large product image catalogs cost-effectively.
  • Development & testing — Test prompts and pipelines without incurring full production costs.

Pro Tips

  • Be specific about lighting, perspective, and style (e.g., "soft golden-hour lighting", "top-down view", "photorealistic").
  • Mention the subject first, then environment, then style and mood.
  • For text in images, keep inscriptions short and clearly stated in the prompt.
  • Use aspect ratios suited to your use case (portrait for people, landscape for scenes).
  • For maximum quality, consider upgrading to MAI-Image-2.5 for final production outputs.

Technical Specifications

SpecValue
Model DeveloperMicrosoft AI
Release DateJune 2, 2026
Input FormatText prompt (natural language)
Output FormatPNG image (base64-encoded)
Max Prompt Length32,000 tokens
Max Output Resolution1,048,576 total pixels (e.g., 1024×1024)
Supported LanguagesEnglish (primary)

Explore Similar Models

One API for All Media AI.

Explore all models

Join our Discord community

Join the Discord community for the latest model updates, prompts, and support.