
MAI-Image 2.5 Text-to-Image API by MICROSOFT
Microsoft's flagship text-to-image generation model, designed to create high-quality, visually rich images from natural language prompts.
MAI-Image-2.5 Text-to-Image
MAI-Image-2.5 is Microsoft's flagship text-to-image generation model, designed to create high-quality, visually rich images from natural language prompts. It uses a diffusion-based generative approach to progressively refine images, enabling strong alignment between the input text and the generated output. Released on June 2, 2026, it ranks among the top-performing image generation models globally.
Key Capabilities
- Photorealistic image synthesis — Generates realistic imagery with consistent visual structure, accurate lighting, depth, and texture, suitable for concept visualization and professional content creation.
- High-fidelity portraits — Produces expressive, natural-looking portraits with accurate facial structure, lighting, and skin texture.
- Accurate text rendering — Significantly improved rendering of legible text within generated images, including labels, posters, packaging, and signage.
- Visual reasoning — Reasons across objects, scene structure, lighting, scale, and spatial positioning to produce consistent outputs even from ambiguous or complex prompts.
- Product, branding & commercial design — Well suited for product imagery, marketing visuals, brand assets, and commercial creative workflows.
- Creative concept visualization — Translates abstract textual descriptions into visually coherent and imaginative outputs.
Pricing
Pricing is based on two components: the input text tokens in the prompt, and a fixed per-image output fee.
| SKU | Description | Unit Price |
|---|---|---|
sku_input_1m_token | Price per 1M input (prompt) tokens | $5.00 |
sku_output_image | Fixed fee per generated image | $0.05 |
Pricing Formula
cost = countTokens(prompt) / 1,000,000 × $5.00 + $0.05
For most prompts (a few hundred tokens), the token cost is negligible and the effective cost is approximately $0.05 per image.
Examples
| Prompt Length | Token Count | Token Cost | Image Fee | Total |
|---|---|---|---|---|
| Short (e.g., 50 tokens) | 50 | ~$0.000250 | $0.05 | ~$0.0503 |
| Medium (e.g., 500 tokens) | 500 | ~$0.002500 | $0.05 | ~$0.0525 |
| Long (e.g., 2,000 tokens) | 2,000 | ~$0.010000 | $0.05 | ~$0.0600 |
For detailed pricing configuration, see models/microsoft/mai/price/microsoft-mai-image-2.5-text-to-image.json.
Best Use Cases
- Marketing & Advertising — Generate product visuals, campaign imagery, and promotional assets.
- Creative Content — Concept art, illustrations, book covers, and editorial imagery.
- E-commerce — Product visualization and lifestyle photography alternatives.
- Presentations — Custom visuals for slides, reports, and pitch decks.
- Prototyping — Rapid visual mockups for design and UX workflows.
- Signage & Packaging — Designs that require legible in-image text rendering.
Pro Tips
- Be specific about lighting, perspective, and style (e.g., "soft golden-hour lighting", "top-down view", "photorealistic").
- Mention the subject first, then environment, then style and mood.
- For text in images, keep inscriptions short and clearly stated in the prompt.
- Use aspect ratios suited to your use case (portrait for people, landscape for scenes).
Technical Specifications
| Spec | Value |
|---|---|
| Model Developer | Microsoft AI |
| Release Date | June 2, 2026 |
| Input Format | Text prompt (natural language) |
| Output Format | PNG image (base64-encoded) |
| Max Prompt Length | 32,000 tokens |
| Max Output Resolution | 1,048,576 total pixels (e.g., 1024×1024) |
| Supported Languages | English (primary) |
Related Models
- MAI-Image-2.5 Edit — Same model with image-to-image editing capability.
- MAI-Image-2.5-Flash Text-to-Image — Faster, lower-cost text-to-image variant.
- MAI-Image-2.5-Flash Edit — Faster, lower-cost image editing variant.

















