ERNIE Image API for Readable Text in Images

ERNIE Image API 將 Baidu 以 Apache 2.0 釋出的開放權重 8B Diffusion Transformer，由 ERNIE-Image Team 帶進你的技術堆疊。它在 LongTextBench 上以 0.9733 名列前茅，能讓海報標題與漫畫對話框保持清晰可讀；同時，蒸餾版 Turbo 變體可將推論從 50 steps 縮短至 8。Atlas Cloud 透過單一 OpenAI-compatible endpoint 提供服務，並採用透明的 pay-as-you-go 定價。立即開始建置。

探索領先模型

Atlas Cloud 為您提供最新的行業領先創意模型。

NEW

文生圖

TURBO

Baidu ERNIE Image Turbo Text-to-image

A fast, low-latency version of ERNIE Image by Baidu, optimized for rapid iteration and scalable image generation.Balances speed and quality, ideal for real-time and high-throughput scenarios.

FREE

免費

ERNIE Image API 端點比較：標準版與 Turbo 文生圖

依據你的速度與品質需求，選擇合適的文生圖端點。

模態	說明
ERNIE Image API (Text To Image)	Turbo endpoint 著重於輸出吞吐量，而標準版 ERNIE Image API 則在同樣的文生圖任務上更偏向追求最高的輸出保真度。它適合海報、編輯視覺與商業版面等最終製作用途；在這些情境中，把每個細節做到位比交付速度更重要。
ERNIE Image Turbo API (Text To Image)	只需一段文字提示，即可在單次請求中產生最多 10 張圖片，並支援 7 種長寬比，從正方形 1024 pixels 到長邊最高 1376 pixels。它針對低延遲調校，預設使用 8 inference steps，並內建 Prompt Enhancer，可在生成前擴展過於簡短的提示。當快速迭代、即時預覽與大量批次執行比榨出最後一點品質提升更重要時，就適合使用它。

模態

說明

ERNIE Image API (Text To Image)

Turbo endpoint 著重於輸出吞吐量，而標準版 ERNIE Image API 則在同樣的文生圖任務上更偏向追求最高的輸出保真度。它適合海報、編輯視覺與商業版面等最終製作用途；在這些情境中，把每個細節做到位比交付速度更重要。

ERNIE Image Turbo API (Text To Image)

只需一段文字提示，即可在單次請求中產生最多 10 張圖片，並支援 7 種長寬比，從正方形 1024 pixels 到長邊最高 1376 pixels。它針對低延遲調校，預設使用 8 inference steps，並內建 Prompt Enhancer，可在生成前擴展過於簡短的提示。當快速迭代、即時預覽與大量批次執行比榨出最後一點品質提升更重要時，就適合使用它。

專為文字、版面與控制而打造：ERNIE Image API

從業界領先的文字渲染與結構化多分鏡版面，到原生雙語提示詞、預設啟用的提示詞增強器、七種輸出尺寸，以及可重現的 Turbo 批次，ERNIE Image API 能將精準指令轉化為可投入生產的影像。

透過 ERNIE Image API 實現清晰可讀的文字渲染

憑藉領先的 LongTextBench 0.9733 分數，模型可直接在生成影像中渲染清晰可讀、拼寫正確的文字。漫畫對話框、海報標題、資訊圖表標籤與 UI mockup 文案都能保持銳利且易讀。

結構化多分鏡版面

生成、編輯、合成與放大等基礎能力，搭配對網格式空間關係的理解一同運作。兩者結合可產出連貫的多分鏡序列與格式化設計，讓設計師能透過單一集中式流程掌控。

ERNIE Image API 的雙語提示詞

英文與中文提示詞都能透過同一套編碼器流程原生處理，捕捉任一語言的慣用表達。這種雙語流暢度可支援全球行銷活動與在地化內容的真實視覺敘事。

預設啟用提示詞增強器

輕量級 Prompt Enhancer 預設啟用，會在短輸入送達擴散骨幹前，將其改寫為更豐富、結構化的描述。當精確字句的字面控制更重要時，可針對每次請求將其關閉。

七種原生輸出尺寸

七種原生輸出尺寸涵蓋正方形 1024x1024、最高至 1376x768 的橫向構圖，以及最低至 768x1376 的直向尺寸。每種比例都直接生成，因此所有格式都能保持構圖完整。

ERNIE Image API 的 Turbo Mode

需要大量產出又不想等待？Turbo mode 最少只需 8 個推論步驟，每次請求最多可回傳 10 張影像，同時可透過明確 seed 讓每個結果都能重現。

ERNIE Image 正面對決：一個 Prompt，三個模型

把完全相同的創作簡報交給旗艦 ERNIE Image 模型、一個熱門競品，以及它速度更快的同系模型，並並排評比各自對文字排版、版面配置與光線的呈現。

提示詞

俯視平鋪靜物攝影，鏡頭完全固定在正上方，垂直向下俯拍一間傳統中式涼茶鋪中風化的淺色榆木藥櫃檯面。上午偏晚的硬質方向性窗光從右側低角度斜掠而入，成為畫面真正的主角——投下修長、清晰、拉長的陰影，向左延伸穿過原木紋理，形成引導線。畫面右側密集區域中，緊密聚集的透明玻璃罐在陽光穿透下發光：半透明的乾菊花蕾、紅色枸杞、捲曲的琥珀色陳皮，以及捕捉光線的深緋紅乾洛神花瓣。一座小型氧化黃銅手持天平帶有霧面包漿，一只磨舊石臼與杵沾著細粉，粗纖維手寫紙藥方單上以工整毛筆書法寫著繁體楷書中文字（「甘草三钱」、「桂花蜜」），邊緣磨損起毛、纖維清晰。捕捉正在發生的一瞬間：一只錫製罐子翻倒側躺，罐口敞開，幾顆枸杞仍在滾動並向外散落，每顆都投下細如針、長而薄的陰影。構圖在密度與留白之間呼吸——右側擁擠的物件群，與左側大片空曠裸木負空間取得平衡。整體為單色系溫暖調性——琥珀、橘橙、老黃銅金——僅由一抹深色洛神紅打破。材質必須經得起放大檢視：乾花瓣的脆薄、氧化黃銅的黯淡質感、紙張邊緣粗糙的纖維、鬆散粉末的顆粒紋理。自然方向光，無人工光暈，乾淨清晰的陰影，寫實材質渲染，克制而優雅，具微距細節的食材與草本靜物攝影，以 85mm lens 拍攝，寬幅水平風景構圖，wide 16:9 aspect ratio，full-bleed。

Generated with Baidu ERNIE Image Turbo on Atlas Cloud

Generated with Qwen Image 2.0 on Atlas Cloud

Generated with Baidu ERNIE Image Turbo on Atlas Cloud

提示詞

一則三格橫向漫畫分鏡，描繪一名少女發明家在凌亂的閣樓工作室中行動。第一格，她在溫暖燈光下素描一台小型飛行機器；第二格，那台裝置劈啪作響並升到半空，螺栓四散飛出；第三格，她高舉雙拳，咧嘴笑著慶祝勝利。乾淨的雙語對話泡泡中呈現清晰的 English 與 Japanese 字體，搭配自信俐落的墨線與網點陰影，溫暖琥珀色燈光與工作室冷色陰影取得平衡。角色設計在三格中保持一致，姿態富有表情，故事由左至右閱讀且序列流暢清楚。鮮明 cel-shaded anime illustration style，粗黑且乾淨的輪廓線。Wide 16:9 aspect ratio，full-bleed。

Generated with Baidu ERNIE Image Turbo on Atlas Cloud

Generated with Qwen Image 2.0 on Atlas Cloud

Generated with Baidu ERNIE Image Turbo on Atlas Cloud

Real Production Work the ERNIE Image API Handles

From text-perfect posters and multi-panel comics to bilingual campaigns, product catalogs, interface mockups, and labeled infographics, the ERNIE Image API turns precise prompts into layout-accurate visuals across every content pipeline.

Marketing and Poster Production with the ERNIE Image API

Legible headlines, pricing, and product copy render straight into campaign posters and banners thanks to the model's leading text accuracy. Marketing teams ship print-ready assets directly, with no separate typesetting step required.

Comics and Sequential Storytelling

Because the model understands grid-based layout and multi-panel structure, it renders coherent comic pages with dialogue set inside speech bubbles. Independent creators and studios draft full storyboards without redrawing every frame by hand.

Bilingual Campaign Localization with the ERNIE Image API

Native English and Chinese prompt support means one workflow produces on-brand visuals for both markets, with text rendered correctly in each script. Global teams localize creative without hiring separate design pipelines per language.

E-Commerce Product Visuals at Scale

Generate lifestyle scenes, product mockups, and promotional imagery across a full catalog through a single API call. The Turbo variant compresses inference to eight steps, so high-volume stores refresh entire catalogs in minutes.

Interface and Product Mockups

Need realistic screens for a pitch? The model renders app interfaces and website mockups with readable labels, buttons, and body copy, giving product teams presentation-ready prototypes before a single component is built.

Educational Infographics with the ERNIE Image API

Strong instruction following pairs imagery with clearly labeled diagrams, charts, and callouts in a single generation. Educators and analysts turn dense source material into explainer graphics that stay legible at any display size.

ERNIE Image 與競品文字轉圖模型的比較

了解 ERNIE Image 在開發者來源、存取模式、雙語文字渲染與每張圖片成本等面向，與其他開放及專有生成器相比的定位。

模型	開發者	存取模式	雙語文字渲染（EN + ZH）	價格（每張圖片）
ERNIE-Image	百度（ERNIE-Image 團隊）	開放權重，Apache 2.0	業界領先，LongTextBench 0.9733	按量付費
ERNIE-Image Turbo	百度（ERNIE-Image 團隊）	開放權重，Apache 2.0	透過 DMD-distilled 8-step inference 維持表現	按量付費
Qwen Image 2.0	阿里巴巴（Tongyi）	開放權重，Apache 2.0	在 1K-token 文字排版版面上表現強勁	$0.035
Z-Image Turbo	阿里巴巴（Tongyi Lab）	開放權重，Apache 2.0	可同時處理複雜中文招牌與英文	$0.005
Seedream v4.5	字節跳動	專有	原生 4K 的設計師級渲染	$0.04

如何在 Atlas Cloud 上使用 ERNIE Image API for Readable Text in Images

幾分鐘即可上手 — 按照以下簡單步驟，透過 Atlas Cloud 平台整合和部署模型。

建立 Atlas Cloud 帳戶

在 atlascloud.ai 註冊並完成驗證。新用戶可獲得免費額度，用於探索平台和測試模型。

為何在 Atlas Cloud 使用 ERNIE Image API for Readable Text in Images

將先進的 ERNIE Image API for Readable Text in Images 模型與 Atlas Cloud 的 GPU 加速平台相結合，提供無與倫比的效能、可擴展性和開發體驗。

效能與靈活性

低延遲：
GPU 最佳化推理，實現即時回應。

統一 API：
一次整合，暢用 ERNIE Image API for Readable Text in Images、GPT、Gemini 和 DeepSeek。

透明定價：
按 Token 計費，支援 Serverless 模式。

企業與規模

開發者體驗：
SDK、資料分析、微調工具和模板一應俱全。

可靠性：
99.99% 可用性、RBAC 權限控制、合規日誌。

安全與合規：
SOC 2 Type II 認證、HIPAA 合規、美國資料主權。

ERNIE Image API: Questions Developers Ask Most

The ERNIE Image API gives developers programmatic access to Baidu's open-weight text-to-image model, an 8B single-stream Diffusion Transformer paired with a Prompt Enhancer that expands short prompts into richer, more structured descriptions. On Atlas Cloud you reach it through one OpenAI-compatible endpoint with pay-as-you-go pricing and Day-0 access.

Its standout strength is legible in-image text. The model scores 0.9733 on LongTextBench in English, the top result among open-weight models, which makes it dependable for posters, comic speech bubbles, infographics, and UI mockups where every character has to be spelled correctly.

Both variants share the same 8B architecture but trade quality against speed. The Standard model runs 50 inference steps at guidance scale 4.0 for maximum fidelity on final assets, while the Turbo variant is distilled with DMD and reinforcement learning down to roughly 8 steps for rapid, high-volume generation.

Yes. Prompts are supported in English, Chinese, and Japanese through the same encoder, and text stays reliable across scripts, scoring 0.9661 on the Chinese LongTextBench. Where several competing models degrade sharply on Chinese characters, this one keeps Simplified, Traditional, and mixed bilingual copy clean.

The Turbo endpoint accepts seven preset sizes through a single size parameter, ranging from a 1024x1024 square to 1376x768 landscape and 768x1376 portrait formats. You can also request up to ten images per call, fix a seed for reproducible results, and toggle the built-in Prompt Enhancer with the use_pe flag.

Getting started takes a single API key. Sign up on Atlas Cloud, point your existing OpenAI-compatible client at the endpoint, and send a prompt with an optional size and seed to receive image URLs in the response. Billing is pay-as-you-go per call with Day-0 access to the model.

In published benchmarks the model outperforms comparable open releases such as FLUX.2-klein-9B, scoring 0.8856 against 0.8481 on GenEval overall. Its widest lead is in text rendering, where FLUX.2 collapses to 0.2183 on Chinese while ERNIE Image holds above 0.96. For workloads built around readable in-image text and structured layouts, it is currently the strongest open-weight choice.

Yes. ERNIE Image is released under the Apache 2.0 license, which permits commercial use, modification, and redistribution. Generated images can go into advertising, merchandise, publications, and other commercial products without license friction.

探索更多系列

Seedance 2.0

Seedance 2.0 API 為您提供 ByteDance 多模態影片模型的生產級存取權限——支援四模態輸入（文字、影像、影片、音訊），以及業界領先的「Universal Reference」（通用參考）系統，可在不同鏡頭間鎖定構圖、運鏡與角色動作。只需一次 API 呼叫即可整合導演級控制，固定費率為 $0.09/秒，即時取得金鑰，無需排隊——由企業級正常運行時間與合規性提供保障。Seedance 2.0 原生 4K 現已上線！

檢視系列

Grok Imagine

Grok Imagine API 為開發者提供 xAI 的圖像、影片和音訊生成一站式套件。它可以生成解析度高達 2K 且支援多語言文本渲染的圖像，以及長達 15 秒且帶有原生同步音訊和基於參考圖像編輯功能的影片。在 Atlas Cloud 上，只需一個金鑰即可執行每個 Grok Imagine 模式，因此您可以在圖像、影片和音訊之間無縫切換，無需單獨設定，每張圖像 0.02 美元起，每秒 0.05 美元起。

檢視系列

Gemini Omni Flash

Gemini Omni API 將 Google DeepMind 於 Google I/O 2026 發表的多模態影片生成與編輯模型帶進你的技術棧。Gemini Omni 將 Gemini 的推理引擎與生成式媒體融合，可接受文字、圖片、影片與音訊的任意組合輸入，產生一致且以知識為根據的輸出。透過自然對話持續打磨成果：替換物件、改寫場景、切換風格，同時維持物理規律、角色與畫面連貫性不變。Atlas Cloud 透過單一整合 API 提供完整的 Gemini Omni Flash 系列——文字生成影片、支援最多 7 張參考圖片的圖片生成影片，以及參考圖生成影片——採每秒計費、價格透明，$0.112 起，無需訂閱。立即開始打造。

檢視系列

GPT Image 2

GPT Image 2 API 為開發者提供了訪問 OpenAI 最新圖像模型的途徑，它是 GPT Image 1.5 的繼任者。該模型可生成和編輯圖像，能夠在拉丁和 CJK 文字上實現準確的文本渲染，並在海報、樣機和資訊圖表方面具備強大的排版能力。在 Atlas Cloud 上，您可以透過一個統一的 API 與 300 多個模型一起訪問它，並享受免費額度、99.99% 的正常運行時間，且無需 OpenAI 組織驗證。

檢視系列

Google

Google最強大的創意模型現已在Atlas Cloud上全面可用。Veo 3.1提供電影等級的影片生成，Nano Banana 2支援高保真圖像建立，而Gemini為每個工作流程帶來多模態智慧。透過單一API key即可存取完整的Google模型套件，提供Day-0可用性和隨用隨付（pay-as-you-go）定價。

檢視系列

Seedance 2.0 Mini

Seedance 2.0 Mini 將 ByteDance 的多模態影片生成技術引入到對速度和成本要求極高的工作流程中。它以更輕量的佔用空間提供 Seedance 2.0 的核心能力——更快的生成速度、更低的單支影片成本，並且使用您現有的同款 API 整合。對於運行高吞吐量流水線或進行大規模原型設計的團隊來說，Mini 是最實用的預設選擇。

檢視系列

ByteDance

從電影級影片生成到高保真影像建立，ByteDance 最強大的模型現已在 Atlas Cloud 上線。以最低的推論定價和零基礎設施開銷，大規模執行 Seedance 和 Seedream。

檢視系列

Alibaba

Atlas Cloud 將 Alibaba 的全系模型陣容整合至同一個 API 中：Qwen 適用於語言和圖像任務，Wan 適用於高達 1080p 的影片生成。所有模型均採用按需付費模式，無需訂閱。您可以使用現有的 OpenAI 兼容客戶端，透過單一的 base URL 存取 Alibaba API。

檢視系列

OpenAI

Atlas Cloud 為您提供存取完整 OpenAI API 產品線的權限，從用於圖像生成的 GPT Image 2 到用於影片的 Sora 2。每個模型均採用按需付費模式，無月度消費限制。使用相容 OpenAI 的 API，只需簡單替換基礎 URL 即可輕鬆接入。

檢視系列

xAI

在 Atlas Cloud 上使用 xAI API 建構完整的影像與影片處理管線。以 2K 解析度生成、使用參考影像進行編輯，並將影像動畫化為音訊同步的影片片段。

檢視系列

Kwaivgi

Kwaivgi API 價格低於標準定價 15%。Atlas Cloud 提供對最新 Kling 版本的零日（Day-0）存取權限，採用按需付費定價且無席位限制。一個帳戶，一個金鑰，暢享從標準版到大師版的所有 Kling 模型。

檢視系列

Seedream 5.0 Pro

Seedream 5.0 Pro API 為開發者在 Atlas Cloud 上提供了字節跳動的可控圖像編輯模型。它透過錨點和座標精確定位編輯，將圖像分離為可編輯圖層，融合多個參考，並精準匹配顏色和材質，支援 2K 和 3K 解析度的多語言文本。在 Atlas Cloud 上，您只需一個金鑰即可存取！

檢視系列

一個 API，暢享全模態 AI。

探索全部模型

ERNIE Image API for Readable Text in Images

探索領先模型

Baidu ERNIE Image Turbo Text-to-image

ERNIE Image API 端點比較：標準版與 Turbo 文生圖

專為文字、版面與控制而打造：ERNIE Image API

透過 ERNIE Image API 實現清晰可讀的文字渲染

結構化多分鏡版面

ERNIE Image API 的雙語提示詞

預設啟用提示詞增強器

七種原生輸出尺寸

ERNIE Image API 的 Turbo Mode

ERNIE Image 正面對決：一個 Prompt，三個模型

Real Production Work the ERNIE Image API Handles

Marketing and Poster Production with the ERNIE Image API

Comics and Sequential Storytelling

Bilingual Campaign Localization with the ERNIE Image API

E-Commerce Product Visuals at Scale

Interface and Product Mockups

Educational Infographics with the ERNIE Image API

ERNIE Image 與競品文字轉圖模型的比較

如何在 Atlas Cloud 上使用 ERNIE Image API for Readable Text in Images

建立 Atlas Cloud 帳戶

為何在 Atlas Cloud 使用 ERNIE Image API for Readable Text in Images

效能與靈活性

企業與規模

ERNIE Image API: Questions Developers Ask Most

探索更多系列

Seedance 2.0

Grok Imagine

Gemini Omni Flash

GPT Image 2

Google

Seedance 2.0 Mini

ByteDance

Alibaba

OpenAI

xAI

Kwaivgi

Seedream 5.0 Pro

一個 API，暢享全模態 AI。

Join our Discord community