Meet the Happy Horse 1.0 API & Happy Horse 1.1 API

The HappyHorse API on Atlas Cloud connects your application to Alibaba's HappyHorse 1.0 and 1.1 video generation models. Produce clips running anywhere from 3 to 15 seconds at 720p or 1080p, steer results with up to nine reference images, and pick the aspect ratio that fits your product. Atlas Cloud adds Day-0 model availability, reliable uptime, and a simple asynchronous REST integration. Start building today.

Explore the Leading Happy Horse

Atlas Cloud provides you with the latest industry-leading creative models.

NEW

reference-to-video

HappyHorse-1.1 Reference-to-video

Generates videos from one to nine reference images and a text prompt, supporting 720P or 1080P output, flexible aspect ratios, and durations from 3 to 15 seconds.

HappyHorse-1.1 Image-to-video

Animates a first-frame image into video with optional prompt guidance, 720P or 1080P output, and durations from 3 to 15 seconds.

HappyHorse-1.1 Text-to-video

Generates videos from text prompts with HappyHorse 1.1, supporting 720P or 1080P output, flexible aspect ratios, and durations from 3 to 15 seconds.

HappyHorse-1.0 Text-to-video

Generates videos from text prompts with HappyHorse 1.0, supporting 720P or 1080P output, flexible aspect ratios, and durations from 3 to 15 seconds.

HappyHorse-1.0 Image-to-video

Animates a first-frame image into video with optional prompt guidance, 720P or 1080P output, and durations from 3 to 15 seconds.

HappyHorse-1.0 Reference-to-video

Generates videos from one to nine reference images and a text prompt, supporting 720P or 1080P output, flexible aspect ratios, and durations from 3 to 15 seconds.

HappyHorse-1.0 Video-edit

Edits an input video with text instructions and optional reference images, supporting 720P or 1080P output.

From

$0.14/SEC

Happy Horse API Peak speed

Lowest cost

Modality	Description
HappyHorse-1.1 Text-to-Video API	Generates 1080p video with synchronized audio from a text prompt in a single pass. Version 1.1 brings improved motion expressiveness, stronger prompt adherence, and smarter scene planning for complex multi-shot narratives.
HappyHorse-1.1 Image-to-Video API	Animates a static image into fluid video while preserving subject identity and visual consistency. Version 1.1 delivers more physically grounded motion and better frame-to-frame continuity across dynamic compositions.
HappyHorse-1.1 Reference-to-Video API	Guides video generation using up to multiple reference images for precise control over style, character identity, and brand elements. Version 1.1's upgraded multi-reference fusion keeps product details and visual fidelity stable across the full clip.
HappyHorse-1.0 T2V API (Text To Video)	Transforms detailed text prompts into cinematic video sequences with claimed synchronized audio generation. Leverages unified Transformer architecture for joint video-audio synthesis.
HappyHorse-1.0 I2V API (Image To Video)	Animates static images with fluid motion while maintaining visual consistency. Processes reference image latents jointly with text and audio tokens in unified sequence.
HappyHorse-1.0 T2V+Audio API (Text to Video with Audio)	Generates complete audio-visual content from text alone — dialogue, environmental sounds, and Foley effects through unified token denoising.
HappyHorse-1.0 I2V+Audio API (Image to Video with Audio)	Transforms still images into animated scenes with synchronized soundscapes — cinematic audio accompaniment generated in single forward pass.

Inside the HappyHorse API: Unified Video and Audio in One Pass

From Arena-topping cinematic quality to native multilingual sound, the HappyHorse API combines unified multimodal generation, lifelike motion, image animation, and an asynchronous workflow behind one pay-as-you-go endpoint.

Cinematic Quality from the HappyHorse API

HappyHorse holds the top spot on the Artificial Analysis Video Arena for both text-to-video and image-to-video, judged by blind human preference. That ranking translates into footage with convincing detail, lighting, and cinematic composition.

Lip Sync Across Seven Languages

Native lip sync spans Mandarin, Cantonese, English, Japanese, Korean, German, and French, with dialogue and Foley generated alongside the video. Localized campaigns no longer need a separate dubbing or sound design pass.

Unified Multimodal Core of the HappyHorse API

Text, image, video, and audio tokens flow through one unified Transformer sequence instead of a chained multi-stage pipeline. Your integration stays simple because every generation mode shares the same underlying architecture.

Motion That Obeys Real-World Physics

When subjects run, collide, or interact, the physics stays believable, a strength behind HappyHorse leading blind image-to-video comparisons on the Video Arena. Sports clips, chase scenes, and dance sequences keep their weight on screen.

Still Photos to Living Scenes with the HappyHorse API

Feed one photograph to the image-to-video endpoint and watch it become fluid 1080p motion while subject identity stays intact. Product shots, portraits, and concept art all animate without losing their look.

Fast Turnaround, Asynchronous by Design

An asynchronous task workflow keeps generation moving: submit a prompt, poll for status, and retrieve the finished clip without blocking your application. Rapid iteration cycles fit tight campaign and prototyping deadlines.

HappyHorse vs Other Models - One Prompt

The same prompt, generated by HappyHorse and other leading video models: cinematic short film and high-end commercial film

Prompt

Create a 10-second cinematic short film about a street musician and a child meeting in a sunny city square. Scene 1, 0-2s: Wide establishing shot of a bright European-style city square in the late afternoon. A street musician sits near a fountain, playing an acoustic guitar. Warm sunlight, soft shadows, pigeons moving in the background, natural city ambience. Scene 2, 2-5s: Medium tracking shot as a young child walks toward the musician while holding a small red balloon. The musician continues playing guitar. The child’s steps, the guitar rhythm, and the ambient crowd sound should feel synchronized and natural. Scene 3, 5-8s: Close two-shot: the musician smiles and slightly changes the melody as the child stops beside him. The child gently taps the balloon in rhythm with the music. Keep both characters’ faces, clothing, body proportions, and spatial positions consistent. Scene 4, 8-10s: Final cinematic pullback shot: the musician keeps playing, the child laughs, pigeons fly across the square, and the fountain sparkles in the sunlight. The music and city ambience fade naturally. Requirements: - Maintain consistent identity, clothing, and proportions for both characters across all shots - Keep the red balloon, guitar, fountain, and city square layout consistent - Use readable cinematic camera control: wide shot, tracking shot, close two-shot, final pullback - Synchronize audio with action: acoustic guitar, footsteps, child laughter, pigeons, fountain, city ambience - Natural human motion, stable hands, realistic guitar-playing movement - No warped faces, no changing character outfits, no disappearing balloon or guitar - Warm cinematic realism, polished short-film look, 1080p

HappyHorse 1.1

Kling V3.0

Pixverse V6

Prompt

Create a 10-second cinematic lifestyle commercial for a smart home coffee machine. Scene 1, 0-2s: Wide shot of a bright modern kitchen in the morning. A smart coffee machine sits on a clean countertop beside a white ceramic cup. Soft sunlight enters through the window, warm and natural. Scene 2, 2-5s: Medium shot of a young professional entering the kitchen and tapping the touchscreen on the coffee machine. The screen lights up smoothly. Add a soft button tap sound, quiet machine startup sound, and subtle morning ambience. Scene 3, 5-8s: Close-up of coffee pouring into the cup. Steam rises naturally, the crema forms on the surface, and the machine remains the exact same design, size, color, touchscreen position, and material. The pouring sound should match the liquid motion. Scene 4, 8-10s: Final hero shot: the person lifts the cup and smiles while the coffee machine stays sharp in the background. Camera slowly pulls back, clean premium home appliance commercial look. Requirements: - Maintain the exact same coffee machine design across all shots - Keep the kitchen layout, countertop, cup, lighting, and product scale consistent - Use readable cinematic camera control: wide shot, medium interaction shot, close-up, final pullback - Synchronize audio with action: button tap, machine startup, coffee pouring, soft morning ambience - Natural hand movement, realistic liquid physics, stable product geometry - No warped hands, no changing touchscreen position, no inconsistent product details - Bright cinematic realism, high-end lifestyle commercial style, 1080p

HappyHorse 1.1

Kling V3.0

Pixverse V6

Where the HappyHorse API Goes to Work

From native audio social clips to multilingual campaigns, reference guided characters, and text driven edits, the HappyHorse API puts every generation mode behind one pay-as-you-go Atlas Cloud endpoint.

Short-Form Social Clips with the HappyHorse API

Dialogue, ambient sound, and Foley effects arrive in the same pass as your 3 to 15 second clips. Creators ship vertical content for TikTok, Reels, and Shorts without a separate dubbing pipeline.

Product Photos That Sell in Motion

If a product photo is all you have, image to video animation adds lifelike motion while preserving the original frame. Online sellers turn static catalogs into demo clips ready for listings and ads.

Multilingual Campaigns via the HappyHorse API

Native lip sync spans seven languages, from Mandarin and English to Japanese, Korean, German, and French. Global marketing teams turn one campaign into localized ads for every market without hiring voice talent.

Characters That Stay Consistent

Up to nine reference images steer reference to video generation, locking character identity, style, and brand elements in place. Episodic storytellers and virtual influencer studios keep faces consistent from scene to scene.

Edit Footage with Plain Text

Describe a change in plain text and the video edit endpoint applies it directly to existing footage. Post production teams restyle scenes, swap elements, and repair shots without scheduling a reshoot.

Previsualize Scenes Through the HappyHorse API

Need to see a scene before the crew shoots it? Arena leading visual quality at 1080p turns scripts into previsualization footage that film teams and agencies use to align on framing and pacing.

Model	Input Types	Output Duration	Resolution	Audio Generation
HappyHorse-1.0	Text, Image	5–8s	1024×1024	√
Seedance 2.0	Text, Image	4~15s	1024×1024	√
Kling 3.0	Text, Image	3~15s	256P~4K	√
Wan-2.6	Text, Image	5s;10s;15s	1080P, 720P	√

How to Use Happy Horse on Atlas Cloud

Get started in minutes — follow these simple steps to integrate and deploy models through Atlas Cloud's platform.

Create an Atlas Cloud Account

Sign up at atlascloud.ai and complete verification. New users receive free credits to explore the platform and test models.

Why Use Happy Horse on Atlas Cloud

Combining the advanced Happy Horse models with Atlas Cloud's GPU-accelerated platform provides unmatched performance, scalability, and developer experience.

Performance & flexibility

Low Latency:
GPU-optimized inference for real-time reasoning.

Unified API:
Run Happy Horse, GPT, Gemini, and DeepSeek with one integration.

Transparent Pricing:
Predictable per-token billing with serverless options.

Enterprise & Scale

Developer Experience:
SDKs, analytics, fine-tuning tools, and templates.

Reliability:
99.99% uptime, RBAC, and compliance-ready logging.

Security & Compliance:
SOC 2 Type II, HIPAA alignment, data sovereignty in US.

HappyHorse API Questions, Answered

The HappyHorse API gives developers programmatic access to HappyHorse, Alibaba's video generation model family that debuted at #1 on the Artificial Analysis Video Arena for both text-to-video and image-to-video. Built on a unified multimodal Transformer, it produces up to 1080p video with synchronized audio in a single pass. Atlas Cloud serves both HappyHorse 1.0 and 1.1 through one OpenAI-compatible key with pay-as-you-go pricing.

Both versions share the same architecture and the same text-to-video, image-to-video, and reference-to-video endpoints. HappyHorse 1.1 improves motion expressiveness, prompt adherence, and multi-reference fusion, so complex multi-shot narratives and brand-consistent clips come out more stable. HappyHorse 1.0 additionally offers a video-edit endpoint that transforms existing footage with text instructions.

Create an Atlas Cloud account, generate an API key from your dashboard, and send a request to any HappyHorse endpoint. Generation runs asynchronously: submit a prompt, receive a task ID, then poll the status endpoint until your video is ready. There is no subscription to set up since billing is pay-as-you-go per generation. Start building today.

Text-to-video, image-to-video, and reference-to-video are available on both HappyHorse 1.0 and 1.1, while 1.0 adds a video-edit mode for reworking existing footage. Every mode accepts prompts up to 2500 characters and returns video through the same task-based workflow, so switching between modes requires only changing the model path in your request.

Output renders at 720P or 1080P, with clip length adjustable from 3 to 15 seconds. Nine aspect ratios are supported, including 16:9, 9:16, 1:1, 4:3, 3:4, 4:5, 5:4, 21:9, and 9:21, covering everything from vertical social clips to ultrawide cinematic frames. A seed parameter also lets you reproduce a generation you like.

Need consistent characters or brand elements across a clip? Reference-to-video accepts one to nine reference images alongside a text prompt and keeps identity, product details, and style stable through the full video. Each image can be JPEG, JPG, PNG, or WEBP up to 20MB, with at least 400 pixels on the shorter side.

Yes. HappyHorse produces dialogue, ambient sound, and Foley effects together with the video in one forward pass instead of relying on a separate audio pipeline. Native lip sync covers seven languages: Mandarin, Cantonese, English, Japanese, Korean, German, and French, which makes it practical for multilingual campaigns without extra dubbing work.

Every HappyHorse API endpoint on Atlas Cloud is priced at $0.14 per generation, billed pay-as-you-go with no subscription or minimum commitment. Whether you call text-to-video on 1.0 or reference-to-video on 1.1, the per-call price stays the same, so costs remain predictable as your volume grows. Start today.

When HappyHorse 1.0 entered the Artificial Analysis Video Arena in April 2026, it immediately ranked #1 in both text-to-video and image-to-video based on blind human preference voting. In image-to-video without audio it reached an Elo of 1402, ahead of Seedance 2.0 at 1355 and Grok Imagine Video at 1331. Arena standings shift as new models launch, so check the live leaderboard for current rankings.

Explore More Families

Seedance 2.0

The Seedance 2.0 API gives you production access to ByteDance's multimodal video model — quad-modal inputs (text, image, video, audio) and an industry-leading "Universal Reference" system that locks composition, camera movement, and character actions across shots. Integrate director-level control with one API call, a flat $0.09/s, instant key, and no waitlist — backed by enterprise-grade uptime and compliance. Seedance 2.0 Native 4K is now live!

View Family

Grok Imagine

The Grok Imagine API gives developers xAI's image, video, and audio generation in one suite. It produces up to 2K images with multilingual text rendering, plus video up to 15 seconds with native, synchronized audio and reference-based editing. On Atlas Cloud one key runs every Grok Imagine mode, so you move between image, video, and audio without separate setups, from $0.02 per image and $0.05 per second.

View Family

Gemini Omni Flash

The Gemini Omni API brings Google DeepMind's multimodal video generation and editing model, introduced at Google I/O 2026, to your stack. Gemini Omni fuses Gemini's reasoning engine with generative media, accepting any mix of text, images, video, and audio to produce consistent, knowledge-grounded output. Refine results through natural conversation, swapping objects, rewriting scenes, and shifting styles while physics, characters, and continuity stay intact. Atlas Cloud serves the full Gemini Omni Flash lineup, text-to-video, image-to-video with up to 7 reference images, and reference-to-video, through one unified API with transparent per-second pricing from $0.112 and no subscription. Start building today.

View Family

GPT Image 2

The GPT Image 2 API gives developers access to OpenAI's latest image model, the successor to GPT Image 1.5. It generates and edits images with accurate text rendering across Latin and CJK scripts, plus strong composition for posters, mockups, and infographics. On Atlas Cloud you reach it through one unified API alongside 300+ models, with free credits, 99.99% uptime, and no OpenAI organization verification required.

View Family

Google

Google's most powerful creative models are all available on Atlas Cloud. Veo 3.1 delivers cinematic video generation, Nano Banana 2 powers high-fidelity image creation, and Gemini brings multimodal intelligence to every workflow. Access the full Google model suite through one API key with Day-0 availability and pay-as-you-go pricing.

View Family

Seedance 2.0 Mini

The Seedance 2.0 Mini API is the lightest, lowest-cost tier of ByteDance's Seedance video line, built for teams where throughput and unit cost matter more than maximum polish. Use it for batch generation, rapid prototyping, and draft passes, all through one OpenAI-compatible key on Atlas Cloud.

View Family

ByteDance

From cinematic video generation to high-fidelity image creation, ByteDance's most powerful models are live on Atlas Cloud. Run Seedance and Seedream at scale with the lowest inference pricing and zero infrastructure overhead.

View Family

Alibaba

Atlas Cloud brings together Alibaba's full model lineup under one API: Qwen for language and image tasks, Wan for video generation up to 1080p. Access every model pay-as-you-go with no subscriptions. The Alibaba API is available via a single base URL using your existing OpenAI-compatible client.

View Family

OpenAI

Atlas Cloud gives you access to the full OpenAI API lineup, from GPT Image 2 for image generation to Sora 2 for video. Every model is available pay-as-you-go with no monthly commitment. Plug in with a single base URL swap using the OpenAI-compatible API.

View Family

xAI

Build complete image and video pipelines using the xAI API on Atlas Cloud. Generate at 2K, edit with reference images, and animate images into audio-synced clips.

View Family

Kwaivgi

The Kwaivgi API at 15% off standard rates. Day-0 access to every new Kling release, pay-as-you-go, no seat limits. One account covers the full Kling lineup.

View Family

Seedream 5.0 Pro

Seedream 5.0 Pro API gives developers ByteDance's controllable image editing model on Atlas Cloud. It places edits precisely with anchors and coordinates, separates images into editable layers, fuses multiple references, and matches exact colors and materials, with multilingual text at 2K and 3K. On Atlas Cloud you reach it through one key!

View Family

One API for All Media AI.

Explore all models

Meet the Happy Horse 1.0 API & Happy Horse 1.1 API

Explore the Leading Happy Horse

HappyHorse-1.1 Reference-to-video

HappyHorse-1.1 Image-to-video

HappyHorse-1.1 Text-to-video

HappyHorse-1.0 Text-to-video

HappyHorse-1.0 Image-to-video

HappyHorse-1.0 Reference-to-video

HappyHorse-1.0 Video-edit

Happy Horse API Peak speed

Inside the HappyHorse API: Unified Video and Audio in One Pass

Cinematic Quality from the HappyHorse API

Lip Sync Across Seven Languages

Unified Multimodal Core of the HappyHorse API

Motion That Obeys Real-World Physics

Still Photos to Living Scenes with the HappyHorse API

Fast Turnaround, Asynchronous by Design

HappyHorse vs Other Models - One Prompt

Where the HappyHorse API Goes to Work

Short-Form Social Clips with the HappyHorse API

Product Photos That Sell in Motion

Multilingual Campaigns via the HappyHorse API

Characters That Stay Consistent

Edit Footage with Plain Text

Previsualize Scenes Through the HappyHorse API

How to Use Happy Horse on Atlas Cloud

Create an Atlas Cloud Account

Why Use Happy Horse on Atlas Cloud

Performance & flexibility

Enterprise & Scale

HappyHorse API Questions, Answered

Explore More Families

Seedance 2.0

Grok Imagine

Gemini Omni Flash

GPT Image 2

Google

Seedance 2.0 Mini

ByteDance

Alibaba

OpenAI

xAI

Kwaivgi

Seedream 5.0 Pro

One API for All Media AI.

Join our Discord community