ERNIE Image API for Readable Text in Images

ERNIE Image API 将百度开源权重的 8B Diffusion Transformer 引入你的技术栈，由 ERNIE-Image Team 基于 Apache 2.0 发布。它在 LongTextBench 上以 0.9733 的成绩位居榜首，能让海报标题和漫画对白气泡保持清晰可读；同时，蒸馏版 Turbo 变体将推理从 50 步缩短到 8 步。Atlas Cloud 通过一个兼容 OpenAI 的端点提供服务，并采用透明的按量付费定价。立即开始构建。

探索领先模型

Atlas Cloud 为您提供最新的行业领先创意模型。

NEW

文生图

TURBO

Baidu ERNIE Image Turbo Text-to-image

A fast, low-latency version of ERNIE Image by Baidu, optimized for rapid iteration and scalable image generation.Balances speed and quality, ideal for real-time and high-throughput scenarios.

FREE

免费

峰值速度

最低成本

Modality	Description
ERNIE Image API (Text To Image)	Where the Turbo endpoint prioritizes throughput, the standard ERNIE Image API leans toward maximum output fidelity on the same text-to-image task. It fits final production work such as posters, editorial graphics, and commercial layouts, where getting every detail right outweighs turnaround time.
ERNIE Image Turbo API (Text To Image)	Turn a single text prompt into as many as ten images per request across seven aspect ratios, from square 1024 pixels up to 1376 pixels on the long edge. Tuned for low latency, it defaults to eight inference steps and ships a built-in Prompt Enhancer that expands terse prompts before generation. Reach for it when rapid iteration, real-time previews, and high-volume batch runs matter more than squeezing out the last increment of quality.

Modality

Description

ERNIE Image API (Text To Image)

Where the Turbo endpoint prioritizes throughput, the standard ERNIE Image API leans toward maximum output fidelity on the same text-to-image task. It fits final production work such as posters, editorial graphics, and commercial layouts, where getting every detail right outweighs turnaround time.

ERNIE Image Turbo API (Text To Image)

Turn a single text prompt into as many as ten images per request across seven aspect ratios, from square 1024 pixels up to 1376 pixels on the long edge. Tuned for low latency, it defaults to eight inference steps and ships a built-in Prompt Enhancer that expands terse prompts before generation. Reach for it when rapid iteration, real-time previews, and high-volume batch runs matter more than squeezing out the last increment of quality.

Built for Text, Layout, and Control: the ERNIE Image API

From industry-leading text rendering and structured multi-panel layouts to native bilingual prompting, a default prompt enhancer, seven output dimensions, and reproducible Turbo batches, the ERNIE Image API turns precise instructions into production-ready imagery.

Legible Text Rendering with the ERNIE Image API

A leading LongTextBench score of 0.9733 lets the model render legible, correctly spelled text straight into generated images. Comic speech bubbles, poster headlines, infographic labels, and UI mockup copy all stay sharp and readable.

Structured, Multi-Panel Layouts

Generation, edit, composite, and upscale primitives work alongside a grasp of grid-based spatial relationships. Together they yield coherent multi-panel sequences and formatted designs that designers can drive through one centralized pipeline.

Bilingual Prompting in the ERNIE Image API

Both English and Chinese prompts run natively through the same encoder pipeline, capturing idiomatic phrasing in either language. This dual fluency supports authentic visual storytelling for global campaigns and localized content alike.

Prompt Enhancer Enabled by Default

Enabled by default, a lightweight Prompt Enhancer rewrites short inputs into richer, structured descriptions before they reach the diffusion backbone. Toggle it off per request whenever literal control over exact wording matters more.

Seven Native Output Dimensions

Seven native output sizes span a square 1024x1024, landscape framings up to 1376x768, and portrait shapes down to 768x1376. Each ratio is generated directly, so framing stays intact across every format.

The ERNIE Image API in Turbo Mode

Need volume without the wait? Turbo mode runs as few as 8 inference steps and returns up to 10 images per request, while an explicit seed keeps every result reproducible.

ERNIE Image Head to Head: One Prompt, Three Models

Feed the exact same brief to the flagship ERNIE Image model, a popular rival, and its faster sibling, then judge how each one renders typography, layout, and light side by side.

提示词

Top-down flatlay still life, camera locked perfectly overhead looking straight down onto a weathered pale-elm apothecary counter of a traditional Chinese herbal-tea dispensary. Hard directional late-morning window light rakes in low from the right, the true protagonist of the frame — casting long, crisp, elongated shadows that stretch leftward across the raw wood grain and act as leading lines. On the dense right side, tightly clustered clear glass jars glow as the sun passes through them: translucent dried chrysanthemum buds, red goji berries, curled amber tangerine peel (chenpi), and deep crimson dried roselle petals catching the light. A small oxidized brass hand-balance scale with matte patina, a worn stone mortar and pestle dusted with fine powder, and coarse-fibered handwritten paper prescription slips inscribed with neat brush-calligraphy Chinese characters in traditional kaishu ("甘草三钱", "桂花蜜"), edges frayed and fibrous. Caught mid-moment: a toppled pewter canister on its side, its mouth open, several goji berries still rolling and scattering outward, each casting its own thin needle-long shadow. Composition breathes through density-and-void — the packed cluster on the right balanced against a broad expanse of empty bare-wood negative space on the left. Monochromatic warm palette throughout — amber, tangerine-orange, aged brass gold — broken by a single note of dark roselle red. Textures must hold up to magnification: the brittle thinness of dried petals, the dull oxidized brass, the ragged paper fiber edges, the grain of loose powder. Natural directional light, no artificial glow, clean crisp shadows, realistic material rendering, restrained and elegant, macro-detailed food-and-herb still-life photography, shot with an 85mm lens, wide horizontal landscape framing, wide 16:9 aspect ratio, full-bleed.

Generated with Baidu ERNIE Image Turbo on Atlas Cloud

Generated with Qwen Image 2.0 on Atlas Cloud

Generated with Baidu ERNIE Image Turbo on Atlas Cloud

提示词

A three-panel horizontal manga strip following a teenage inventor girl in her cluttered attic workshop. In the first panel she sketches a small flying machine by warm lamplight, in the second the contraption sputters and lifts off mid-air scattering bolts, in the third she throws both fists up grinning in triumph. Clean bilingual speech bubbles carry crisp English and Japanese lettering, drawn with confident ink linework and screentone shading, warm amber lamp glow balanced against cool workshop shadows. Character design stays consistent across all three panels, gestures stay expressive, and the story reads left to right with clear sequential flow. Vibrant cel-shaded anime illustration style with bold clean outlines. Wide 16:9 aspect ratio, full-bleed.

Generated with Baidu ERNIE Image Turbo on Atlas Cloud

Generated with Qwen Image 2.0 on Atlas Cloud

Generated with Baidu ERNIE Image Turbo on Atlas Cloud

ERNIE Image API 可胜任的真实生产工作

从文字精准的海报和多格漫画，到双语营销活动、商品目录、界面原型和带标注的信息图，ERNIE Image API 能将精确提示词转化为版式准确的视觉内容，覆盖各类内容生产流程。

使用 ERNIE Image API 制作营销素材和海报

凭借模型领先的文字准确性，清晰可读的标题、价格和产品文案可直接渲染到营销海报和横幅中。营销团队无需单独排版步骤，即可直接交付可印刷成品素材。

漫画与连续叙事

由于模型理解基于网格的版式和多格结构，它可以渲染连贯的漫画页面，并将对白放入对话气泡中。独立创作者和工作室无需手工重绘每一帧，就能快速起草完整分镜。

使用 ERNIE Image API 进行双语营销本地化

原生支持英文和中文提示词，意味着同一套流程即可为两个市场生成符合品牌调性的视觉内容，并在不同文字体系中正确渲染文本。全球团队无需为每种语言单独搭建设计流程，也能完成创意本地化。

大规模电商商品视觉内容

通过一次 API 调用，即可为完整商品目录生成生活方式场景、产品模型图和促销图片。Turbo 版本将推理压缩到 eight steps，因此高流量店铺可在数分钟内刷新整套商品目录。

界面与产品原型图

需要用于提案的真实感界面吗？模型可以渲染应用界面和网站原型图，并生成可读的标签、按钮和正文文案，让产品团队在构建任何组件之前，就能获得可用于演示的原型。

使用 ERNIE Image API 制作教育信息图

强大的指令遵循能力，可在一次生成中将图像与标注清晰的示意图、图表和注释结合起来。教育工作者和分析师可以把密集的源材料转化为说明型图形，并在任何显示尺寸下保持清晰可读。

ERNIE Image Measured Against Rival Text-to-Image Models

See where ERNIE Image lands next to other open and proprietary generators across developer origin, access model, bilingual text rendering, and per-image cost.

Model	Developer	Access Model	Bilingual Text Rendering (EN + ZH)	Price (per image)
ERNIE-Image	Baidu (ERNIE-Image Team)	Open weights, Apache 2.0	Industry-leading, LongTextBench 0.9733	Pay-as-you-go
ERNIE-Image Turbo	Baidu (ERNIE-Image Team)	Open weights, Apache 2.0	Retained through DMD-distilled 8-step inference	Pay-as-you-go
Qwen Image 2.0	Alibaba (Tongyi)	Open weights, Apache 2.0	Strong across 1K-token typography layouts	$0.035
Z-Image Turbo	Alibaba (Tongyi Lab)	Open weights, Apache 2.0	Handles complex Chinese signage alongside English	$0.005
Seedream v4.5	ByteDance	Proprietary	Designer-level rendering at native 4K	$0.04

如何在 Atlas Cloud 上使用 ERNIE Image API for Readable Text in Images

几分钟即可上手 — 按照以下简单步骤，通过 Atlas Cloud 平台集成和部署模型。

创建 Atlas Cloud 账户

在 atlascloud.ai 注册并完成验证。新用户可获得免费额度，用于探索平台和测试模型。

为何在 Atlas Cloud 使用 ERNIE Image API for Readable Text in Images

将先进的 ERNIE Image API for Readable Text in Images 模型与 Atlas Cloud 的 GPU 加速平台相结合，提供无与伦比的性能、可扩展性和开发体验。

性能与灵活性

低延迟：
GPU 优化推理，实现实时响应。

统一 API：
一次集成，畅用 ERNIE Image API for Readable Text in Images、GPT、Gemini 和 DeepSeek。

透明定价：
按 Token 计费，支持 Serverless 模式。

企业与规模

开发者体验：
SDK、数据分析、微调工具和模板一应俱全。

可靠性：
99.99% 可用性、RBAC 权限控制、合规日志。

安全与合规：
SOC 2 Type II 认证、HIPAA 合规、美国数据主权。

ERNIE Image API: Questions Developers Ask Most

The ERNIE Image API gives developers programmatic access to Baidu's open-weight text-to-image model, an 8B single-stream Diffusion Transformer paired with a Prompt Enhancer that expands short prompts into richer, more structured descriptions. On Atlas Cloud you reach it through one OpenAI-compatible endpoint with pay-as-you-go pricing and Day-0 access.

Its standout strength is legible in-image text. The model scores 0.9733 on LongTextBench in English, the top result among open-weight models, which makes it dependable for posters, comic speech bubbles, infographics, and UI mockups where every character has to be spelled correctly.

Both variants share the same 8B architecture but trade quality against speed. The Standard model runs 50 inference steps at guidance scale 4.0 for maximum fidelity on final assets, while the Turbo variant is distilled with DMD and reinforcement learning down to roughly 8 steps for rapid, high-volume generation.

Yes. Prompts are supported in English, Chinese, and Japanese through the same encoder, and text stays reliable across scripts, scoring 0.9661 on the Chinese LongTextBench. Where several competing models degrade sharply on Chinese characters, this one keeps Simplified, Traditional, and mixed bilingual copy clean.

The Turbo endpoint accepts seven preset sizes through a single size parameter, ranging from a 1024x1024 square to 1376x768 landscape and 768x1376 portrait formats. You can also request up to ten images per call, fix a seed for reproducible results, and toggle the built-in Prompt Enhancer with the use_pe flag.

Getting started takes a single API key. Sign up on Atlas Cloud, point your existing OpenAI-compatible client at the endpoint, and send a prompt with an optional size and seed to receive image URLs in the response. Billing is pay-as-you-go per call with Day-0 access to the model.

In published benchmarks the model outperforms comparable open releases such as FLUX.2-klein-9B, scoring 0.8856 against 0.8481 on GenEval overall. Its widest lead is in text rendering, where FLUX.2 collapses to 0.2183 on Chinese while ERNIE Image holds above 0.96. For workloads built around readable in-image text and structured layouts, it is currently the strongest open-weight choice.

Yes. ERNIE Image is released under the Apache 2.0 license, which permits commercial use, modification, and redistribution. Generated images can go into advertising, merchandise, publications, and other commercial products without license friction.

探索更多系列

Seedance 2.0

Seedance 2.0 API 为您提供 ByteDance 多模态视频模型的生产级访问权限——支持四模态输入（文本、图像、视频、音频），以及行业领先的“Universal Reference”（通用参考）系统，可在不同镜头间锁定构图、运镜和角色动作。只需一次 API 调用即可集成导演级控制，固定费率为 $0.09/秒，即时获取密钥，无需排队——由企业级正常运行时间和合规性提供保障。Seedance 2.0 原生 4K 现已上线！

查看系列

Grok Imagine

Grok Imagine API 为开发者提供 xAI 的图像、视频和音频生成一站式套件。它可以生成分辨率高达 2K 且支持多语言文本渲染的图像，以及长达 15 秒且带有原生同步音频和基于参考图像编辑功能的视频。在 Atlas Cloud 上，只需一个密钥即可运行每个 Grok Imagine 模式，因此您可以在图像、视频和音频之间无缝切换，无需单独设置，每张图像 0.02 美元起，每秒 0.05 美元起。

查看系列

Gemini Omni Flash

Gemini Omni API 将 Google DeepMind 在 Google I/O 2026 上发布的多模态视频生成与编辑模型带入你的技术栈。Gemini Omni 将 Gemini 的推理引擎与生成式媒体融合，可接受文本、图像、视频和音频的任意组合输入，生成一致且以知识为依据的输出。通过自然对话不断打磨结果：替换物体、重写场景、切换风格，同时保持物理规律、角色形象和画面连贯性不变。Atlas Cloud 通过统一的 API 提供完整的 Gemini Omni Flash 系列——文生视频、支持最多 7 张参考图的图生视频，以及参考图生视频——按秒计费、价格透明，低至 $0.112 起，且无需订阅。立即开始构建。

查看系列

GPT Image 2

GPT Image 2 API 为开发者提供了访问 OpenAI 最新图像模型的途径，它是 GPT Image 1.5 的继任者。该模型可生成和编辑图像，能够在拉丁和 CJK 文字上实现准确的文本渲染，并在海报、样机和信息图表方面具备强大的排版能力。在 Atlas Cloud 上，您可以通过一个统一的 API 与 300 多个模型一起访问它，并享受免费额度、99.99% 的正常运行时间，且无需 OpenAI 组织验证。

查看系列

Google

Google最强大的创意模型现已在Atlas Cloud上全面可用。Veo 3.1提供电影级别的视频生成，Nano Banana 2支持高保真图像创建，而Gemini为每个工作流带来多模态智能。通过单一API key即可访问完整的Google模型套件，提供Day-0可用性和按需付费（pay-as-you-go）定价。

查看系列

Seedance 2.0 Mini

Seedance 2.0 Mini 将 ByteDance 的多模态视频生成技术引入到对速度和成本要求极高的工作流中。它以更轻量的占用空间提供 Seedance 2.0 的核心能力——更快的生成速度、更低的单条视频成本，并且使用您现有的同款 API 集成。对于运行高吞吐量流水线或进行大规模原型设计的团队来说，Mini 是最实用的默认选择。

查看系列

ByteDance

从电影级视频生成到高保真图像创建，ByteDance 最强大的模型现已在 Atlas Cloud 上线。以最低的推理定价和零基础设施开销，大规模运行 Seedance 和 Seedream。

查看系列

Alibaba

Atlas Cloud 将 Alibaba 的全系模型阵容整合至同一个 API 中：Qwen 用于语言和图像任务，Wan 用于高达 1080p 的视频生成。所有模型均采用按需付费模式，无需订阅。您可以使用现有的 OpenAI 兼容客户端，通过单一的 base URL 访问 Alibaba API。

查看系列

OpenAI

Atlas Cloud 为您提供访问完整 OpenAI API 产品线的权限，从用于图像生成的 GPT Image 2 到用于视频的 Sora 2。每个模型均采用按需付费模式，无月度消费限制。使用兼容 OpenAI 的 API，只需简单替换基础 URL 即可轻松接入。

查看系列

xAI

在 Atlas Cloud 上使用 xAI API 构建完整的图像和视频处理工作流。以 2K 分辨率生成、使用参考图像进行编辑，并将图像动画化为音画同步的视频片段。

查看系列

Kwaivgi

Kwaivgi API 价格低于标准定价 15%。Atlas Cloud 提供对最新 Kling 版本的零日（Day-0）访问权限，采用按需付费定价且无席位限制。一个账户，一个密钥，畅享从标准版到大师版的所有 Kling 模型。

查看系列

Seedream 5.0 Pro

Seedream 5.0 Pro API 为开发者在 Atlas Cloud 上提供了字节跳动的可控图像编辑模型。它通过锚点和坐标精确定位编辑，将图像分离为可编辑图层，融合多个参考，并精准匹配颜色和材质，支持 2K 和 3K 分辨率的多语言文本。在 Atlas Cloud 上，您只需一个密钥即可访问！

查看系列

一个 API，畅享全模态 AI。

探索全部模型

ERNIE Image API for Readable Text in Images

探索领先模型

Baidu ERNIE Image Turbo Text-to-image

峰值速度

Built for Text, Layout, and Control: the ERNIE Image API

Legible Text Rendering with the ERNIE Image API

Structured, Multi-Panel Layouts

Bilingual Prompting in the ERNIE Image API

Prompt Enhancer Enabled by Default

Seven Native Output Dimensions

The ERNIE Image API in Turbo Mode

ERNIE Image Head to Head: One Prompt, Three Models

ERNIE Image API 可胜任的真实生产工作

使用 ERNIE Image API 制作营销素材和海报

漫画与连续叙事

使用 ERNIE Image API 进行双语营销本地化

大规模电商商品视觉内容

界面与产品原型图

使用 ERNIE Image API 制作教育信息图

ERNIE Image Measured Against Rival Text-to-Image Models

如何在 Atlas Cloud 上使用 ERNIE Image API for Readable Text in Images

创建 Atlas Cloud 账户

为何在 Atlas Cloud 使用 ERNIE Image API for Readable Text in Images

性能与灵活性

企业与规模

ERNIE Image API: Questions Developers Ask Most

探索更多系列

Seedance 2.0

Grok Imagine

Gemini Omni Flash

GPT Image 2

Google

Seedance 2.0 Mini

ByteDance

Alibaba

OpenAI

xAI

Kwaivgi

Seedream 5.0 Pro

一个 API，畅享全模态 AI。

Join our Discord community