ホーム
探索
Microsof
MAI Image 2.5 Models
microsoft/mai-image-2.5-flash/text-to-image
MAI-Image-2.5-Flash Text-to-image
テキストから画像

MAI-Image 2.5 Flash Text-to-Image API by MICROSOFT

microsoft/mai-image-2.5-flash/text-to-image
Text-to-image

Microsoft's fast, cost-optimized text-to-image generation model, creating high-quality images at lower cost using the same diffusion-based architecture as MAI-Image-2.5.

MAI-Image-2.5-Flash Text-to-Image

MAI-Image-2.5-Flash is Microsoft's fast, cost-optimized text-to-image generation model, designed to create high-quality, visually rich images from natural language prompts at significantly lower cost than the standard MAI-Image-2.5. It uses the same diffusion-based generative approach, enabling strong alignment between the input text and the generated output, while being optimized for speed and throughput. Released on June 2, 2026.

Key Capabilities

  • Photorealistic image synthesis — Generates realistic imagery with consistent visual structure, accurate lighting, depth, and texture, suitable for concept visualization and professional content creation.
  • High-fidelity portraits — Produces expressive, natural-looking portraits with accurate facial structure, lighting, and skin texture.
  • Accurate text rendering — Improved rendering of legible text within generated images, including labels, posters, packaging, and signage.
  • Visual reasoning — Reasons across objects, scene structure, lighting, scale, and spatial positioning to produce consistent outputs even from ambiguous or complex prompts.
  • Product, branding & commercial design — Well suited for product imagery, marketing visuals, brand assets, and commercial creative workflows.
  • Creative concept visualization — Translates abstract textual descriptions into visually coherent and imaginative outputs.

Flash vs. Standard

FeatureMAI-Image-2.5MAI-Image-2.5-Flash
Output image cost$0.05 / image$0.03 / image
SpeedStandardFaster
QualityMaximum fidelityHigh quality, optimized
Best forPremium productionHigh-volume, cost-sensitive

Flash is the recommended choice for high-volume workflows, rapid prototyping, and scenarios where speed and cost efficiency take priority over absolute maximum fidelity.

Pricing

Pricing is based on two components: the input text tokens in the prompt, and a fixed per-image output fee.

SKUDescriptionUnit Price
sku_input_1m_tokenPrice per 1M input (prompt) tokens$5.00
sku_output_imageFixed fee per generated image$0.03

Pricing Formula

cost = countTokens(prompt) / 1,000,000 × $5.00 + $0.03

For most prompts (a few hundred tokens), the token cost is negligible and the effective cost is approximately $0.03 per image — 40% lower than the standard MAI-Image-2.5.

Examples

Prompt LengthToken CountToken CostImage FeeTotal
Short (e.g., 50 tokens)50~$0.000250$0.03~$0.0303
Medium (e.g., 500 tokens)500~$0.002500$0.03~$0.0325
Long (e.g., 2,000 tokens)2,000~$0.010000$0.03~$0.0400

For detailed pricing configuration, see models/microsoft/mai/price/microsoft-mai-image-2.5-flash-text-to-image.json.

Best Use Cases

  • High-volume generation — Batch generation of marketing assets, product variants, or dataset images.
  • Rapid prototyping — Quickly iterate on visual concepts before committing to full-quality renders.
  • Real-time applications — Interactive generation in products where response time matters.
  • Social media content — Fast-turnaround visuals for campaigns and posts.
  • E-commerce at scale — Generate large product image catalogs cost-effectively.
  • Development & testing — Test prompts and pipelines without incurring full production costs.

Pro Tips

  • Be specific about lighting, perspective, and style (e.g., "soft golden-hour lighting", "top-down view", "photorealistic").
  • Mention the subject first, then environment, then style and mood.
  • For text in images, keep inscriptions short and clearly stated in the prompt.
  • Use aspect ratios suited to your use case (portrait for people, landscape for scenes).
  • For maximum quality, consider upgrading to MAI-Image-2.5 for final production outputs.

Technical Specifications

SpecValue
Model DeveloperMicrosoft AI
Release DateJune 2, 2026
Input FormatText prompt (natural language)
Output FormatPNG image (base64-encoded)
Max Prompt Length32,000 tokens
Max Output Resolution1,048,576 total pixels (e.g., 1024×1024)
Supported LanguagesEnglish (primary)

類似モデルを探索

ひとつのAPIで、あらゆるメディアAIを。

すべてのモデルを探索

Join our Discord community

Join the Discord community for the latest model updates, prompts, and support.