

ERNIE-Image is an open-weight text-to-image model developed by the ERNIE-Image Team at Baidu, built on a single-stream Diffusion Transformer (DiT) with 8B parameters and paired with a lightweight Prompt Enhancer that rewrites short prompts into richer, more structured descriptions before passing them to the diffusion backbone. NYU Shanghai RITS Released on April 15, 2026 under the Apache 2.0 license, it transforms natural language descriptions into detailed imagery with particular strength in text rendering and structured layout generation. ERNIE-Image is designed not only for strong visual quality, but for controllability in practical generation scenarios where accurate content realization matters as much as aesthetics — making it well-suited for commercial posters, comics, multi-panel layouts, and other content creation tasks that require both visual quality and precise control.
Atlas Cloud 为您提供最新的行业领先创意模型。
最低成本
| 模态 | 描述 |
|---|---|
| ERNIE-Image API (Text To Image) | The flagship quality-focused model. The SFT variant runs at guidance scale 4.0 with 50 inference steps for maximum quality 24-7 Press Release — optimized for final production assets including posters, editorial graphics, and commercial layouts. |
| ERNIE-Image Turbo API (Text To Image) | The Turbo variant, optimized through DMD (Diffusion Model Distillation) and reinforcement learning, compresses inference steps from 50 to 8, achieving 6x+ speed improvement while maintaining high-quality output. Stable Learn Ideal for rapid iteration and high-volume workflows. |
将先进模型与 Atlas Cloud 的 GPU 加速平台相结合,为图像和视频生成提供无与伦比的速度、可扩展性和创意控制。

ERNIE-Image leads the open-source field with a LongTextBench score of 0.9733 — rendering accurate text inside images including comic speech bubbles, poster headlines, infographic labels, and UI mockup copy. If your use case requires legible, correctly-spelled text baked into the image, ERNIE-Image is the clear leader.

The codebase exposes generation, edit, composite, and upscale primitives so designers can centralize an asset pipeline. Let's Data Science By understanding spatial relationships and grid-based arrangements, it generates coherent multi-panel sequential artwork and formatted designs.

Both English and Chinese prompts are natively supported through the same encoder pipeline 24-7 Press Release, capturing cultural nuances and idiomatic expressions across languages for authentic visual storytelling.

ERNIE-Image generates print-ready marketing materials with embedded typography, product placements, and professional layouts. For creatives and product teams, ERNIE-Image lowers the barrier to production-grade poster, comic, storyboard, and UI asset generation without license friction.
探索使用该模型家族可以构建的实际应用场景和工作流 — 从内容创作、自动化到生产级应用。
Generate campaign-ready posters, banners, and promotional materials with embedded text, product visuals, and professional layouts at high throughput — suitable for both quick drafts (Turbo) and final assets (Standard).
Create book covers, magazine illustrations, and editorial graphics with precise typography and artistic consistency. The industry-leading text rendering makes it ideal for text-heavy publication designs.
ERNIE-Image lowers the barrier to production-grade comic, storyboard, and sequential art generation Let's Data Science with consistent character representation and integrated dialogue — streamlining production for independent creators and studios.
Generate realistic application screenshots, website mockups, and interface designs with readable text elements and coherent layout structures for presentation and prototyping.
ERNIE-Image performs strongly on complex instruction following and text rendering GitHub, making it well-suited for visually engaging educational materials, data visualizations, and explainer graphics combining imagery with clear, legible annotations.
Develop character designs, environment concepts, and promotional artwork with cinematic quality and consistent style — supporting both indie and professional production pipelines.
查看不同厂商的模型表现 — 对比性能、价格和独特优势,做出明智决策。
| Model | Reference Image Limit | Output Num | Resolution | Aspect Ratio |
|---|---|---|---|---|
| ERNIE-Image | 0 (T2I) | 1–8 | 1024×1024 | 1:1 |
| ERNIE-Image Turbo | 0 (T2I) | 1–8 | 1024×1024 | 1:1 |
| Qwen-Image | 3 | 1–6 | 512P~2K | Width[512, 2048]px; Height[512, 2048]px |
| Flux.1 | 1 | 1 | 256P~4K | Width[256, 4096]px; Height[256, 4096]px |
| Seedream 5.0 | 14 | 1~15 | 2K~4K+ | 1:1 3:2 2:3 3:4 4:3 4:5 5:4 9:16 16:9 21:9 |
几分钟即可上手 — 按照以下简单步骤,通过 Atlas Cloud 平台集成和部署模型。
在 atlascloud.ai 注册并完成验证。新用户可获得免费额度,用于探索平台和测试模型。
将先进的 ERNIE Image Models 模型与 Atlas Cloud 的 GPU 加速平台相结合,提供无与伦比的性能、可扩展性和开发体验。
低延迟:
GPU 优化推理,实现实时响应。
统一 API:
一次集成,畅用 ERNIE Image Models、GPT、Gemini 和 DeepSeek。
透明定价:
按 Token 计费,支持 Serverless 模式。
开发者体验:
SDK、数据分析、微调工具和模板一应俱全。
可靠性:
99.99% 可用性、RBAC 权限控制、合规日志。
安全与合规:
SOC 2 Type II 认证、HIPAA 合规、美国数据主权。
A: ERNIE-Image achieves top-tier image rendering on consumer-grade GPUs. It excels in following complex instructions and multi-language text rendering, with comprehensive capabilities comparable to top-tier closed-source models. CnTechPost Its particular strengths in text rendering (LongTextBench 0.9733) and structured layout generation for comics, posters, and infographics set it apart from general-purpose open models.
A: Both English and Chinese text rendering score above 0.96 on LongTextBench. FLUX.2 collapses in Chinese scenarios (scoring 0.2183), while ERNIE-Image remains stable Stable Learn — handling Simplified Chinese, Traditional Chinese, and mixed bilingual content with high accuracy.
Yes. ERNIE-Image is released under the Apache 2.0 license GitHub, which permits commercial use, modification, and distribution. Generated images can be used in advertising, merchandise, publications, and commercial applications.
Join the Discord community for the latest model updates, prompts, and support.