
MAI-Image 2.5 Flash Text-to-Image API by MICROSOFT
Microsoft's fast, cost-optimized text-to-image generation model, creating high-quality images at lower cost using the same diffusion-based architecture as MAI-Image-2.5.
MAI-Image-2.5-Flash Text-to-Image
MAI-Image-2.5-Flash is Microsoft's fast, cost-optimized text-to-image generation model, designed to create high-quality, visually rich images from natural language prompts at significantly lower cost than the standard MAI-Image-2.5. It uses the same diffusion-based generative approach, enabling strong alignment between the input text and the generated output, while being optimized for speed and throughput. Released on June 2, 2026.
Key Capabilities
- Photorealistic image synthesis — Generates realistic imagery with consistent visual structure, accurate lighting, depth, and texture, suitable for concept visualization and professional content creation.
- High-fidelity portraits — Produces expressive, natural-looking portraits with accurate facial structure, lighting, and skin texture.
- Accurate text rendering — Improved rendering of legible text within generated images, including labels, posters, packaging, and signage.
- Visual reasoning — Reasons across objects, scene structure, lighting, scale, and spatial positioning to produce consistent outputs even from ambiguous or complex prompts.
- Product, branding & commercial design — Well suited for product imagery, marketing visuals, brand assets, and commercial creative workflows.
- Creative concept visualization — Translates abstract textual descriptions into visually coherent and imaginative outputs.
Flash vs. Standard
| Feature | MAI-Image-2.5 | MAI-Image-2.5-Flash |
|---|---|---|
| Output image cost | $0.05 / image | $0.03 / image |
| Speed | Standard | Faster |
| Quality | Maximum fidelity | High quality, optimized |
| Best for | Premium production | High-volume, cost-sensitive |
Flash is the recommended choice for high-volume workflows, rapid prototyping, and scenarios where speed and cost efficiency take priority over absolute maximum fidelity.
Pricing
Pricing is based on two components: the input text tokens in the prompt, and a fixed per-image output fee.
| SKU | Description | Unit Price |
|---|---|---|
sku_input_1m_token | Price per 1M input (prompt) tokens | $5.00 |
sku_output_image | Fixed fee per generated image | $0.03 |
Pricing Formula
cost = countTokens(prompt) / 1,000,000 × $5.00 + $0.03
For most prompts (a few hundred tokens), the token cost is negligible and the effective cost is approximately $0.03 per image — 40% lower than the standard MAI-Image-2.5.
Examples
| Prompt Length | Token Count | Token Cost | Image Fee | Total |
|---|---|---|---|---|
| Short (e.g., 50 tokens) | 50 | ~$0.000250 | $0.03 | ~$0.0303 |
| Medium (e.g., 500 tokens) | 500 | ~$0.002500 | $0.03 | ~$0.0325 |
| Long (e.g., 2,000 tokens) | 2,000 | ~$0.010000 | $0.03 | ~$0.0400 |
For detailed pricing configuration, see models/microsoft/mai/price/microsoft-mai-image-2.5-flash-text-to-image.json.
Best Use Cases
- High-volume generation — Batch generation of marketing assets, product variants, or dataset images.
- Rapid prototyping — Quickly iterate on visual concepts before committing to full-quality renders.
- Real-time applications — Interactive generation in products where response time matters.
- Social media content — Fast-turnaround visuals for campaigns and posts.
- E-commerce at scale — Generate large product image catalogs cost-effectively.
- Development & testing — Test prompts and pipelines without incurring full production costs.
Pro Tips
- Be specific about lighting, perspective, and style (e.g., "soft golden-hour lighting", "top-down view", "photorealistic").
- Mention the subject first, then environment, then style and mood.
- For text in images, keep inscriptions short and clearly stated in the prompt.
- Use aspect ratios suited to your use case (portrait for people, landscape for scenes).
- For maximum quality, consider upgrading to MAI-Image-2.5 for final production outputs.
Technical Specifications
| Spec | Value |
|---|---|
| Model Developer | Microsoft AI |
| Release Date | June 2, 2026 |
| Input Format | Text prompt (natural language) |
| Output Format | PNG image (base64-encoded) |
| Max Prompt Length | 32,000 tokens |
| Max Output Resolution | 1,048,576 total pixels (e.g., 1024×1024) |
| Supported Languages | English (primary) |
Related Models
- MAI-Image-2.5-Flash Edit — Same Flash model with image-to-image editing capability.
- MAI-Image-2.5 Text-to-Image — Full-quality text-to-image variant.
- MAI-Image-2.5 Edit — Full-quality image editing variant.

















