alibaba/wan-2.5/video-extend-fast

文生视频

Extend your videos with Alibaba WAN 2.5 video extender model with audio.

Wan 2.5: A next-generation AI video generation model developed by Alibaba Wanxiang.

Model Card Overview

Field	Description
Model Name	Wan 2.5
Developed By	Alibaba Group
Release Date	September 24, 2025
Model Type	Generative AI, Video Foundation Model
Related Links	Official Website: https://wan.video/, Hugging Face: https://huggingface.co/Wan-AI, Technical Paper (Wan Series): https://arxiv.org/abs/2503.20314

Introduction

Wan 2.5 is a state-of-the-art, open-source video foundation model developed by Alibaba's Wan AI team. It is designed to generate high-quality, cinematic videos complete with synchronized audio directly from text or image prompts. The model represents a significant advancement in the field of generative AI, aiming to lower the barrier for creative video production. Its core contribution lies in its ability to produce coherent, dynamic, and narratively consistent video clips with a high degree of realism and integrated audio-visual elements, such as lip-sync and sound effects, in a single, streamlined process.

Key Features & Innovations

Wan 2.5 introduces several key features that distinguish it from previous models and competitors:

Unified Audio-Visual Synthesis: Unlike many models that require separate steps for video and audio generation, Wan 2.5 creates video with natively synchronized audio, including voice, sound effects, and lip-sync, in one step.
High-Fidelity, High-Resolution Output: The model is capable of generating videos in multiple resolutions, including 480p, 720p, and full 1080p HD, with significant improvements in visual quality and frame-to-frame stability over its predecessors.
Extended Video Duration: Wan 2.5 can generate video clips up to 10 seconds in length, offering more creative flexibility for storytelling compared to other models in its class.
Advanced Cinematic Control: The model demonstrates a sophisticated understanding of cinematic language, allowing for precise control over camera movement, shot composition, and character consistency within scenes.
Open-Source Commitment: Following the precedent set by earlier versions, the Wan series of models, including Wan 2.5, are open-sourced to encourage research, development, and innovation within the broader AI community.

Model Architecture & Technical Details

Wan 2.5 is built upon the Diffusion Transformer (DiT) paradigm, which has become a mainstream approach for high-quality generative tasks. The technical report for the Wan model series outlines a suite of innovations that contribute to its performance.

The architecture includes a novel Variational Autoencoder (VAE) designed for high-efficiency video compression, enabling the model to handle high-resolution video data effectively. The Wan series is available in multiple sizes to balance performance and computational requirements, such as the 1.3B and 14B parameter models detailed for Wan 2.2. The model was trained on a massive, curated dataset comprising billions of images and videos, which enhances its ability to generalize across a wide range of motions, semantics, and aesthetic styles.

Intended Use & Applications

Wan 2.5 is designed for a wide array of applications in creative and commercial fields. Its intended uses include:

Content Creation: Generating short-form videos for social media, marketing campaigns, and digital advertising.
Storytelling and Filmmaking: Creating cinematic scenes, character animations, and narrative sequences for short films and conceptual art.
Prototyping: Rapidly visualizing scripts and storyboards for film, television, and game development.
Personalized Media: Enabling users to create unique, personalized video content from their own ideas and images.

Performance

Wan 2.5 has demonstrated significant performance improvements over previous versions and holds a competitive position against other leading video generation models. Independent reviews and benchmarks provide insight into its capabilities.

Benchmark Scores

A review conducted by Curious Refuge Labs™ evaluated the model's visual generation capabilities across several metrics.

Metric	Score (out of 10)
Prompt Adherence	7.0
Temporal Consistency	6.6
Visual Fidelity	6.5
Motion Quality	5.9
Style & Cinematic Realism	5.7
Overall Score	6.3

These scores indicate strong prompt understanding and a notable improvement in visual quality from Wan 2.2, although it still shows limitations in complex motion and realism compared to top-tier commercial models.

详细规格

概览：

模型提供商：QWEN

模型类型：text-to-video

部署方式：推理 API；Playground

定价：$0.3400/second

关键参数：

尺寸上限：最大宽度 × 高度（用户可配置）

LoRA 支持：否

种子选项：N/A

创作你的下一件杰作

探索类似模型

NEW

HOT

图生视频

Wan-2.6 Image-to-video Flash

Wan2.6 image to video flash, faster and more cost-effective generation. Intelligent shot scheduling enables multi‑camera storytelling, supports stable multi‑speaker dialogue with more natural and realistic vocal timbres.

$0.0175/秒

NEW

视频转视频

Wan-2.6 Video-to-video

A speed-optimized video-to-video option that prioritizes lower latency while retaining strong visual fidelity. Ideal for iteration, batch generation, and prompt testing.

$0.07/秒

NEW

图生视频

Wan-2.6 Image-to-video

A speed-optimized image-to-video option that prioritizes lower latency while retaining strong visual fidelity. Ideal for iteration, batch generation, and prompt testing.

Wan-2.6 Text-to-video

A speed-optimized text-to-video option that prioritizes lower latency while retaining strong visual fidelity. Ideal for iteration, batch generation, and prompt testing.

$0.07/秒

Wan 2.5 - 智能视频创作者的选择

为什么选择 Wan 2.5?

更实惠

尽管 Google 最近降价,但 Veo 3 整体仍然昂贵。Wan 2.5 轻量且性价比高,为创作者提供更多选择,同时大幅降低制作成本。

一步生成,端到端同步

使用 Wan 2.5,无需单独录制语音或手动对齐口型。只需提供清晰、结构化的提示词,一次性生成带有音频/配音和口型同步的完整视频 - 更快更简单。

多语言友好

当提示词为中文时,Wan 2.5 可靠地生成音视频同步视频。相比之下,Veo 3 对中文提示词经常显示「未知语言」。

精准角色还原

Wan 2.5 擅长角色特征还原,准确呈现角色外观、表情和动作风格,让生成的视频角色更具辨识度和个性化,增强叙事性和沉浸感。

艺术风格渲染

支持吉卜力风格渲染,创造手绘水彩质感和动画效果。带来温暖、梦幻的视觉体验,增强艺术感染力和叙事深度。

谁能受益?

营销团队

无论是产品发布、促销活动还是品牌营销,Wan 2.5 帮助您快速生成高质量视频,让创作变得简单高效。

产品演示和教程,无需协调烦恼
社交媒体营销,多语言字幕和口型同步
AI 生成内容让团队专注于策略和创意

Bottom line: 总结:创作从未如此简单、快速和智能 - Wan 2.5 是您营销的秘密武器!

全球企业

为跨国公司提供理想的内容本地化解决方案,让创作更轻松、更高效。

多语言视频支持,提示词识别
一键生成口型同步的字幕和配音
快速内容本地化,面向全球市场

Bottom line: 总结:跨境内容创作从未如此简单、快速和智能。

故事创作者 / YouTuber

创作者可以利用 Wan 2.5 提高视频制作效率,同时确保高质量输出。

沉浸式叙事,精准的角色动作和表情
更高的发布效率,减少编辑和后期制作时间
从短视频到动画故事片段的多样化内容

企业培训团队

Wan 2.5 让企业培训更高效、更引人入胜。

专业视频取代枯燥的文本文档
快速创建操作演示和培训教程
一致的风格和标准化输出,便于全球推广

自由创意人 / 小型工作室

Wan 2.5 让创意自由流动,无需昂贵的设备或演员 - AI 高效生成一切。

尝试从短片到社交媒体内容的多样化作品
从灵感到完成,「一键生成」
无需昂贵设备或专业演员的高质量内容

Bottom line: 总结:Wan 2.5 让创作更轻松、更自由、更精彩,每次尝试都令人惊艳!

教育机构 / 在线课程创作者

将创意转化为现实,无需高成本 - Wan 2.5 让优质内容制作变得简单经济。

尝试从短片到宣传视频的各种风格
从概念到成品的更高制作效率
无需昂贵设备或专业人才的优质内容

Bottom line: 总结:Wan 2.5 让创作轻松、高效、自由 - 每次尝试都精彩纷呈!

核心特性

一步音视频生成

在单一流程中生成带有同步音频、配音和口型同步的完整视频

双角色同步

支持同时生成两个角色,动作、表情和口型同步,自然互动

专业品质

高质量视频输出,逼真的角色表情和精确的口型同步

多语言支持

对中文提示词的出色支持,可靠生成多语言内容

性价比高

与竞品相比成本大幅降低,同时保持专业品质

角色特征还原

精准还原角色外观、表情和动作风格,高保真度和个性化

艺术风格渲染

支持包括吉卜力风格手绘水彩质感在内的各种艺术风格

沉浸式场景

非常适合对话场景、访谈或双人短片,自然的音视频一致性

Wan 2.5 Prompt Showcase

Discover the power of Wan 2.5 through these curated examples. From digital human lip-sync to dual character scenes, artistic rendering to character restoration - experience the possibilities.

Digital Human Sync

Study Room Scholar

Middle-aged man reading with perfect lip-sync in a warm study environment

Lip-sync with audioEnvironmental soundsCharacter emotion

Prompt

A middle-aged man sitting at a wooden desk in a cozy study room, surrounded by bookshelves and a warm lamp glow. He opens an old book and reads aloud with a calm, deep voice: 'History teaches us more than just facts… it shows us who we are.' The room has subtle background sounds: pages turning, the faint ticking of a clock, and distant rain against the window.

Dual Character Scene

Park Sunset Romance

Couple interaction with synchronized dual character actions and expressions

Dual character syncNatural interactionAmbient soundscape

Prompt

A young couple sitting on a park bench during sunset. The woman leans her head on the man's shoulder. He whispers softly: 'No matter where we go, I'll always be here with you.' The sound includes the rustling of leaves, distant laughter of children playing, and the gentle hum of cicadas in the evening air.

Character Restoration

Ballet Performance Art

Precise character trait restoration with artistic movement and expression

Character trait restorationMovement precisionArtistic lighting

Prompt

A graceful ballerina with her hair in a messy bun, performing a powerful and emotional contemporary ballet routine. She is in a minimalist, dark art studio. Abstract patterns of light and shadow, projected from a hidden source, dance across her body and the surrounding walls, constantly shifting with her movements. The camera focuses on the tension in her muscles and the expressive gestures of her hands. A single, dramatic slow-motion shot captures her mid-air leap, with the light patterns swirling around her like a galaxy. Moody, artistic, high contrast.

Artistic Style Rendering

Ghibli Forest Magic

Studio Ghibli-inspired animation with hand-painted watercolor texture

Ghibli art styleHand-painted textureMagical atmosphere

Prompt

Studio Ghibli-inspired anime style. A young girl with a straw hat lies peacefully in a sun-dappled magical forest, surrounded by friendly, glowing forest spirits (Kodama). A gentle breeze rustles the leaves of the giant, ancient trees. The air is filled with sparkling dust motes, illuminated by shafts of sunlight. The art style is soft, with a hand-painted watercolor texture. The scene feels serene, magical, and heartwarming.

Experience these prompts and discover the creative potential of Wan 2.5's synchronized A/V generation technology.

使用场景

🎬

视频制作

📢

营销内容

🎓

教育视频

📱

社交媒体

🌐

多语言内容

💼

企业培训

🎭

娱乐

💃

表演艺术

🎨

动画与番剧

📚

故事讲述

👥

双角色视频

🎙️

访谈

📺

广播媒体

技术规格

模型类型:音视频同步生成

核心特性:音视频同步、角色还原、艺术渲染、多语言

语言支持:中文、英文等

输出质量:专业高清视频带音频

生成速度:快速一步生成

API 集成:RESTful API 与完整文档

体验 Wan 2.5 - 您的视频创作革命

加入数千名创作者和企业,用同步音视频生成技术改变您的视频内容创作。

🎬一步音视频同步

🌍多语言支持

⚡性价比高

300+ 模型，即刻开启，

尽在 Atlas Cloud。

探索全部模型