ERNIE Image API for Readable Text in Images

The ERNIE Image API brings Baidu's open-weight 8B Diffusion Transformer to your stack, released by the ERNIE-Image Team under Apache 2.0. It tops LongTextBench at 0.9733, keeping poster headlines and comic speech bubbles legible, while a distilled Turbo variant cuts inference from 50 steps to 8. Atlas Cloud serves it through one OpenAI-compatible endpoint with transparent pay-as-you-go pricing. Start building today.

Explore the Leading ERNIE Image API for Readable Text in Images

Atlas Cloud provides you with the latest industry-leading creative models.

NEW

text-to-image

TURBO

Baidu ERNIE Image Turbo Text-to-image

A fast, low-latency version of ERNIE Image by Baidu, optimized for rapid iteration and scalable image generation.Balances speed and quality, ideal for real-time and high-throughput scenarios.

FREE

Free

ERNIE Image API Endpoints Compared: Standard and Turbo Text-to-Image

Match each text-to-image endpoint to your speed and quality needs.

Modality	Description
ERNIE Image API (Text To Image)	Where the Turbo endpoint prioritizes throughput, the standard ERNIE Image API leans toward maximum output fidelity on the same text-to-image task. It fits final production work such as posters, editorial graphics, and commercial layouts, where getting every detail right outweighs turnaround time.
ERNIE Image Turbo API (Text To Image)	Turn a single text prompt into as many as ten images per request across seven aspect ratios, from square 1024 pixels up to 1376 pixels on the long edge. Tuned for low latency, it defaults to eight inference steps and ships a built-in Prompt Enhancer that expands terse prompts before generation. Reach for it when rapid iteration, real-time previews, and high-volume batch runs matter more than squeezing out the last increment of quality.

Modality

Description

ERNIE Image API (Text To Image)

Where the Turbo endpoint prioritizes throughput, the standard ERNIE Image API leans toward maximum output fidelity on the same text-to-image task. It fits final production work such as posters, editorial graphics, and commercial layouts, where getting every detail right outweighs turnaround time.

ERNIE Image Turbo API (Text To Image)

Turn a single text prompt into as many as ten images per request across seven aspect ratios, from square 1024 pixels up to 1376 pixels on the long edge. Tuned for low latency, it defaults to eight inference steps and ships a built-in Prompt Enhancer that expands terse prompts before generation. Reach for it when rapid iteration, real-time previews, and high-volume batch runs matter more than squeezing out the last increment of quality.

Built for Text, Layout, and Control: the ERNIE Image API

From industry-leading text rendering and structured multi-panel layouts to native bilingual prompting, a default prompt enhancer, seven output dimensions, and reproducible Turbo batches, the ERNIE Image API turns precise instructions into production-ready imagery.

Legible Text Rendering with the ERNIE Image API

A leading LongTextBench score of 0.9733 lets the model render legible, correctly spelled text straight into generated images. Comic speech bubbles, poster headlines, infographic labels, and UI mockup copy all stay sharp and readable.

Structured, Multi-Panel Layouts

Generation, edit, composite, and upscale primitives work alongside a grasp of grid-based spatial relationships. Together they yield coherent multi-panel sequences and formatted designs that designers can drive through one centralized pipeline.

Bilingual Prompting in the ERNIE Image API

Both English and Chinese prompts run natively through the same encoder pipeline, capturing idiomatic phrasing in either language. This dual fluency supports authentic visual storytelling for global campaigns and localized content alike.

Prompt Enhancer Enabled by Default

Enabled by default, a lightweight Prompt Enhancer rewrites short inputs into richer, structured descriptions before they reach the diffusion backbone. Toggle it off per request whenever literal control over exact wording matters more.

Seven Native Output Dimensions

Seven native output sizes span a square 1024x1024, landscape framings up to 1376x768, and portrait shapes down to 768x1376. Each ratio is generated directly, so framing stays intact across every format.

The ERNIE Image API in Turbo Mode

Need volume without the wait? Turbo mode runs as few as 8 inference steps and returns up to 10 images per request, while an explicit seed keeps every result reproducible.

ERNIE Image Head to Head: One Prompt, Three Models

Feed the exact same brief to the flagship ERNIE Image model, a popular rival, and its faster sibling, then judge how each one renders typography, layout, and light side by side.

Prompt

Top-down flatlay still life, camera locked perfectly overhead looking straight down onto a weathered pale-elm apothecary counter of a traditional Chinese herbal-tea dispensary. Hard directional late-morning window light rakes in low from the right, the true protagonist of the frame — casting long, crisp, elongated shadows that stretch leftward across the raw wood grain and act as leading lines. On the dense right side, tightly clustered clear glass jars glow as the sun passes through them: translucent dried chrysanthemum buds, red goji berries, curled amber tangerine peel (chenpi), and deep crimson dried roselle petals catching the light. A small oxidized brass hand-balance scale with matte patina, a worn stone mortar and pestle dusted with fine powder, and coarse-fibered handwritten paper prescription slips inscribed with neat brush-calligraphy Chinese characters in traditional kaishu ("甘草三钱", "桂花蜜"), edges frayed and fibrous. Caught mid-moment: a toppled pewter canister on its side, its mouth open, several goji berries still rolling and scattering outward, each casting its own thin needle-long shadow. Composition breathes through density-and-void — the packed cluster on the right balanced against a broad expanse of empty bare-wood negative space on the left. Monochromatic warm palette throughout — amber, tangerine-orange, aged brass gold — broken by a single note of dark roselle red. Textures must hold up to magnification: the brittle thinness of dried petals, the dull oxidized brass, the ragged paper fiber edges, the grain of loose powder. Natural directional light, no artificial glow, clean crisp shadows, realistic material rendering, restrained and elegant, macro-detailed food-and-herb still-life photography, shot with an 85mm lens, wide horizontal landscape framing, wide 16:9 aspect ratio, full-bleed.

Generated with Baidu ERNIE Image Turbo on Atlas Cloud

Generated with Qwen Image 2.0 on Atlas Cloud

Generated with Baidu ERNIE Image Turbo on Atlas Cloud

Prompt

A three-panel horizontal manga strip following a teenage inventor girl in her cluttered attic workshop. In the first panel she sketches a small flying machine by warm lamplight, in the second the contraption sputters and lifts off mid-air scattering bolts, in the third she throws both fists up grinning in triumph. Clean bilingual speech bubbles carry crisp English and Japanese lettering, drawn with confident ink linework and screentone shading, warm amber lamp glow balanced against cool workshop shadows. Character design stays consistent across all three panels, gestures stay expressive, and the story reads left to right with clear sequential flow. Vibrant cel-shaded anime illustration style with bold clean outlines. Wide 16:9 aspect ratio, full-bleed.

Generated with Baidu ERNIE Image Turbo on Atlas Cloud

Generated with Qwen Image 2.0 on Atlas Cloud

Generated with Baidu ERNIE Image Turbo on Atlas Cloud

Real Production Work the ERNIE Image API Handles

From text-perfect posters and multi-panel comics to bilingual campaigns, product catalogs, interface mockups, and labeled infographics, the ERNIE Image API turns precise prompts into layout-accurate visuals across every content pipeline.

Marketing and Poster Production with the ERNIE Image API

Legible headlines, pricing, and product copy render straight into campaign posters and banners thanks to the model's leading text accuracy. Marketing teams ship print-ready assets directly, with no separate typesetting step required.

Comics and Sequential Storytelling

Because the model understands grid-based layout and multi-panel structure, it renders coherent comic pages with dialogue set inside speech bubbles. Independent creators and studios draft full storyboards without redrawing every frame by hand.

Bilingual Campaign Localization with the ERNIE Image API

Native English and Chinese prompt support means one workflow produces on-brand visuals for both markets, with text rendered correctly in each script. Global teams localize creative without hiring separate design pipelines per language.

E-Commerce Product Visuals at Scale

Generate lifestyle scenes, product mockups, and promotional imagery across a full catalog through a single API call. The Turbo variant compresses inference to eight steps, so high-volume stores refresh entire catalogs in minutes.

Interface and Product Mockups

Need realistic screens for a pitch? The model renders app interfaces and website mockups with readable labels, buttons, and body copy, giving product teams presentation-ready prototypes before a single component is built.

Educational Infographics with the ERNIE Image API

Strong instruction following pairs imagery with clearly labeled diagrams, charts, and callouts in a single generation. Educators and analysts turn dense source material into explainer graphics that stay legible at any display size.

ERNIE Image Measured Against Rival Text-to-Image Models

See where ERNIE Image lands next to other open and proprietary generators across developer origin, access model, bilingual text rendering, and per-image cost.

Model	Developer	Access Model	Bilingual Text Rendering (EN + ZH)	Price (per image)
ERNIE-Image	Baidu (ERNIE-Image Team)	Open weights, Apache 2.0	Industry-leading, LongTextBench 0.9733	Pay-as-you-go
ERNIE-Image Turbo	Baidu (ERNIE-Image Team)	Open weights, Apache 2.0	Retained through DMD-distilled 8-step inference	Pay-as-you-go
Qwen Image 2.0	Alibaba (Tongyi)	Open weights, Apache 2.0	Strong across 1K-token typography layouts	$0.035
Z-Image Turbo	Alibaba (Tongyi Lab)	Open weights, Apache 2.0	Handles complex Chinese signage alongside English	$0.005
Seedream v4.5	ByteDance	Proprietary	Designer-level rendering at native 4K	$0.04

How to Use ERNIE Image API for Readable Text in Images on Atlas Cloud

Get started in minutes — follow these simple steps to integrate and deploy models through Atlas Cloud's platform.

Create an Atlas Cloud Account

Sign up at atlascloud.ai and complete verification. New users receive free credits to explore the platform and test models.

Why Use ERNIE Image API for Readable Text in Images on Atlas Cloud

Combining the advanced ERNIE Image API for Readable Text in Images models with Atlas Cloud's GPU-accelerated platform provides unmatched performance, scalability, and developer experience.

Performance & flexibility

Low Latency:
GPU-optimized inference for real-time reasoning.

Unified API:
Run ERNIE Image API for Readable Text in Images, GPT, Gemini, and DeepSeek with one integration.

Transparent Pricing:
Predictable per-token billing with serverless options.

Enterprise & Scale

Developer Experience:
SDKs, analytics, fine-tuning tools, and templates.

Reliability:
99.99% uptime, RBAC, and compliance-ready logging.

Security & Compliance:
SOC 2 Type II, HIPAA alignment, data sovereignty in US.

ERNIE Image API: Questions Developers Ask Most

The ERNIE Image API gives developers programmatic access to Baidu's open-weight text-to-image model, an 8B single-stream Diffusion Transformer paired with a Prompt Enhancer that expands short prompts into richer, more structured descriptions. On Atlas Cloud you reach it through one OpenAI-compatible endpoint with pay-as-you-go pricing and Day-0 access.

Its standout strength is legible in-image text. The model scores 0.9733 on LongTextBench in English, the top result among open-weight models, which makes it dependable for posters, comic speech bubbles, infographics, and UI mockups where every character has to be spelled correctly.

Both variants share the same 8B architecture but trade quality against speed. The Standard model runs 50 inference steps at guidance scale 4.0 for maximum fidelity on final assets, while the Turbo variant is distilled with DMD and reinforcement learning down to roughly 8 steps for rapid, high-volume generation.

Yes. Prompts are supported in English, Chinese, and Japanese through the same encoder, and text stays reliable across scripts, scoring 0.9661 on the Chinese LongTextBench. Where several competing models degrade sharply on Chinese characters, this one keeps Simplified, Traditional, and mixed bilingual copy clean.

The Turbo endpoint accepts seven preset sizes through a single size parameter, ranging from a 1024x1024 square to 1376x768 landscape and 768x1376 portrait formats. You can also request up to ten images per call, fix a seed for reproducible results, and toggle the built-in Prompt Enhancer with the use_pe flag.

Getting started takes a single API key. Sign up on Atlas Cloud, point your existing OpenAI-compatible client at the endpoint, and send a prompt with an optional size and seed to receive image URLs in the response. Billing is pay-as-you-go per call with Day-0 access to the model.

In published benchmarks the model outperforms comparable open releases such as FLUX.2-klein-9B, scoring 0.8856 against 0.8481 on GenEval overall. Its widest lead is in text rendering, where FLUX.2 collapses to 0.2183 on Chinese while ERNIE Image holds above 0.96. For workloads built around readable in-image text and structured layouts, it is currently the strongest open-weight choice.

Yes. ERNIE Image is released under the Apache 2.0 license, which permits commercial use, modification, and redistribution. Generated images can go into advertising, merchandise, publications, and other commercial products without license friction.

Explore More Families

Seedance 2.0

The Seedance 2.0 API gives you production access to ByteDance's multimodal video model — quad-modal inputs (text, image, video, audio) and an industry-leading "Universal Reference" system that locks composition, camera movement, and character actions across shots. Integrate director-level control with one API call, a flat $0.09/s, instant key, and no waitlist — backed by enterprise-grade uptime and compliance. Seedance 2.0 Native 4K is now live!

View Family

Grok Imagine

The Grok Imagine API gives developers xAI's image, video, and audio generation in one suite. It produces up to 2K images with multilingual text rendering, plus video up to 15 seconds with native, synchronized audio and reference-based editing. On Atlas Cloud one key runs every Grok Imagine mode, so you move between image, video, and audio without separate setups, from $0.02 per image and $0.05 per second.

View Family

Gemini Omni Flash

The Gemini Omni API brings Google DeepMind's multimodal video generation and editing model, introduced at Google I/O 2026, to your stack. Gemini Omni fuses Gemini's reasoning engine with generative media, accepting any mix of text, images, video, and audio to produce consistent, knowledge-grounded output. Refine results through natural conversation, swapping objects, rewriting scenes, and shifting styles while physics, characters, and continuity stay intact. Atlas Cloud serves the full Gemini Omni Flash lineup, text-to-video, image-to-video with up to 7 reference images, and reference-to-video, through one unified API with transparent per-second pricing from $0.112 and no subscription. Start building today.

View Family

GPT Image 2

The GPT Image 2 API gives developers access to OpenAI's latest image model, the successor to GPT Image 1.5. It generates and edits images with accurate text rendering across Latin and CJK scripts, plus strong composition for posters, mockups, and infographics. On Atlas Cloud you reach it through one unified API alongside 300+ models, with free credits, 99.99% uptime, and no OpenAI organization verification required.

View Family

Google

Google's most powerful creative models are all available on Atlas Cloud. Veo 3.1 delivers cinematic video generation, Nano Banana 2 powers high-fidelity image creation, and Gemini brings multimodal intelligence to every workflow. Access the full Google model suite through one API key with Day-0 availability and pay-as-you-go pricing.

View Family

Seedance 2.0 Mini

The Seedance 2.0 Mini API is the lightest, lowest-cost tier of ByteDance's Seedance video line, built for teams where throughput and unit cost matter more than maximum polish. Use it for batch generation, rapid prototyping, and draft passes, all through one OpenAI-compatible key on Atlas Cloud.

View Family

ByteDance

From cinematic video generation to high-fidelity image creation, ByteDance's most powerful models are live on Atlas Cloud. Run Seedance and Seedream at scale with the lowest inference pricing and zero infrastructure overhead.

View Family

Alibaba

Atlas Cloud brings together Alibaba's full model lineup under one API: Qwen for language and image tasks, Wan for video generation up to 1080p. Access every model pay-as-you-go with no subscriptions. The Alibaba API is available via a single base URL using your existing OpenAI-compatible client.

View Family

OpenAI

Atlas Cloud gives you access to the full OpenAI API lineup, from GPT Image 2 for image generation to Sora 2 for video. Every model is available pay-as-you-go with no monthly commitment. Plug in with a single base URL swap using the OpenAI-compatible API.

View Family

xAI

Build complete image and video pipelines using the xAI API on Atlas Cloud. Generate at 2K, edit with reference images, and animate images into audio-synced clips.

View Family

Kwaivgi

The Kwaivgi API at 15% off standard rates. Day-0 access to every new Kling release, pay-as-you-go, no seat limits. One account covers the full Kling lineup.

View Family

Seedream 5.0 Pro

Seedream 5.0 Pro API gives developers ByteDance's controllable image editing model on Atlas Cloud. It places edits precisely with anchors and coordinates, separates images into editable layers, fuses multiple references, and matches exact colors and materials, with multilingual text at 2K and 3K. On Atlas Cloud you reach it through one key!

View Family

One API for All Media AI.

Explore all models

ERNIE Image API for Readable Text in Images

Explore the Leading ERNIE Image API for Readable Text in Images

Baidu ERNIE Image Turbo Text-to-image

ERNIE Image API Endpoints Compared: Standard and Turbo Text-to-Image

Built for Text, Layout, and Control: the ERNIE Image API

Legible Text Rendering with the ERNIE Image API

Structured, Multi-Panel Layouts

Bilingual Prompting in the ERNIE Image API

Prompt Enhancer Enabled by Default

Seven Native Output Dimensions

The ERNIE Image API in Turbo Mode

ERNIE Image Head to Head: One Prompt, Three Models

Real Production Work the ERNIE Image API Handles

Marketing and Poster Production with the ERNIE Image API

Comics and Sequential Storytelling

Bilingual Campaign Localization with the ERNIE Image API

E-Commerce Product Visuals at Scale

Interface and Product Mockups

Educational Infographics with the ERNIE Image API

ERNIE Image Measured Against Rival Text-to-Image Models

How to Use ERNIE Image API for Readable Text in Images on Atlas Cloud

Create an Atlas Cloud Account

Why Use ERNIE Image API for Readable Text in Images on Atlas Cloud

Performance & flexibility

Enterprise & Scale

ERNIE Image API: Questions Developers Ask Most

Explore More Families

Seedance 2.0

Grok Imagine

Gemini Omni Flash

GPT Image 2

Google

Seedance 2.0 Mini

ByteDance

Alibaba

OpenAI

xAI

Kwaivgi

Seedream 5.0 Pro

One API for All Media AI.

Join our Discord community