alibaba/wan-2.2/i2v-720p

Open and Advanced Large-Scale Video Generative Models.

IMAGE-TO-VIDEONEW
图生视频

Open and Advanced Large-Scale Video Generative Models.

Wan 2.2 AI Video Model

Wan 2.2 is a new generation multimodal generative model launched by WAN AI. This model adopts an innovative MoE (Mixture of Experts) architecture, consisting of high-noise and low-noise expert models. It can divide expert models according to denoising timesteps, thus generating higher quality video content.

Wan2.2 have focused on incorporating the following innovations:

  • Effective MoE Architecture: Wan2.2 introduces a Mixture-of-Experts (MoE) architecture into video diffusion models. By separating the denoising process cross timesteps with specialized powerful expert models, this enlarges the overall model capacity while maintaining the same computational cost.

  • Cinematic-level Aesthetics: Wan2.2 incorporates meticulously curated aesthetic data, complete with detailed labels for lighting, composition, contrast, color tone, and more. This allows for more precise and controllable cinematic style generation, facilitating the creation of videos with customizable aesthetic preferences.

  • Complex Motion Generation: Compared to Wan2.1, Wan2.2 is trained on a significantly larger data, with +65.6% more images and +83.2% more videos. This expansion notably enhances the model's generalization across multiple dimensions such as motions, semantics, and aesthetics, achieving TOP performance among all open-sourced and closed-sourced models.

  • Efficient High-Definition Hybrid TI2V: Wan2.2 open-sources a 5B model built with our advanced Wan2.2-VAE that achieves a compression ratio of 16×16×4. This model supports both text-to-video and image-to-video generation at 720P resolution with 24fps and can also run on consumer-grade graphics cards like 4090. It is one of the fastest 720P@24fps models currently available, capable of serving both the industrial and academic sectors simultaneously.

Key Features of Wan 2.2

  • cinematic-level aesthetic control, deeply integrating professional film industry aesthetic standards, supporting multi-dimensional visual control such as lighting, color, and composition;
  • large-scale complex motion, easily restoring various complex motions and enhancing the smoothness and controllability of motion;
  • precise semantic compliance, excelling in complex scenes and multi-object generation, better restoring users’ creative intentions. The model supports multiple generation modes such as text-to-video and image-to-video, suitable for content creation, artistic creation, education and training, and other application scenarios.

Model Highlights

  • Cinematic-level Aesthetic Control: Professional camera language, supports multi-dimensional visual control such as lighting, color, and composition
  • Large-scale Complex Motion: Smoothly restores various complex motions, enhances motion controllability and naturalness
  • Precise Semantic Compliance: Complex scene understanding, multi-object generation, better restoring creative intentions

详细规格

概览:

模型提供商:QWEN
模型类型:image-to-video
部署方式:推理 API;Playground
定价:$0.3000/second

关键参数:

尺寸上限:最大宽度 × 高度(用户可配置)
LoRA 支持:
种子选项:N/A

创作你的下一件杰作

300+ 模型,即刻开启,

尽在 Atlas Cloud。