alibaba/wan-2.2/i2v-720p

Open and Advanced Large-Scale Video Generative Models.

IMAGE-TO-VIDEONEW
Görüntü-Video

Open and Advanced Large-Scale Video Generative Models.

Wan 2.2 AI Video Model

Wan 2.2 is a new generation multimodal generative model launched by WAN AI. This model adopts an innovative MoE (Mixture of Experts) architecture, consisting of high-noise and low-noise expert models. It can divide expert models according to denoising timesteps, thus generating higher quality video content.

Wan2.2 have focused on incorporating the following innovations:

  • Effective MoE Architecture: Wan2.2 introduces a Mixture-of-Experts (MoE) architecture into video diffusion models. By separating the denoising process cross timesteps with specialized powerful expert models, this enlarges the overall model capacity while maintaining the same computational cost.

  • Cinematic-level Aesthetics: Wan2.2 incorporates meticulously curated aesthetic data, complete with detailed labels for lighting, composition, contrast, color tone, and more. This allows for more precise and controllable cinematic style generation, facilitating the creation of videos with customizable aesthetic preferences.

  • Complex Motion Generation: Compared to Wan2.1, Wan2.2 is trained on a significantly larger data, with +65.6% more images and +83.2% more videos. This expansion notably enhances the model's generalization across multiple dimensions such as motions, semantics, and aesthetics, achieving TOP performance among all open-sourced and closed-sourced models.

  • Efficient High-Definition Hybrid TI2V: Wan2.2 open-sources a 5B model built with our advanced Wan2.2-VAE that achieves a compression ratio of 16×16×4. This model supports both text-to-video and image-to-video generation at 720P resolution with 24fps and can also run on consumer-grade graphics cards like 4090. It is one of the fastest 720P@24fps models currently available, capable of serving both the industrial and academic sectors simultaneously.

Key Features of Wan 2.2

  • cinematic-level aesthetic control, deeply integrating professional film industry aesthetic standards, supporting multi-dimensional visual control such as lighting, color, and composition;
  • large-scale complex motion, easily restoring various complex motions and enhancing the smoothness and controllability of motion;
  • precise semantic compliance, excelling in complex scenes and multi-object generation, better restoring users’ creative intentions. The model supports multiple generation modes such as text-to-video and image-to-video, suitable for content creation, artistic creation, education and training, and other application scenarios.

Model Highlights

  • Cinematic-level Aesthetic Control: Professional camera language, supports multi-dimensional visual control such as lighting, color, and composition
  • Large-scale Complex Motion: Smoothly restores various complex motions, enhances motion controllability and naturalness
  • Precise Semantic Compliance: Complex scene understanding, multi-object generation, better restoring creative intentions

Detaylı Özellikler

Genel Bakış:

Model Sağlayıcı:QWEN
Model Türü:image-to-video
Dağıtım:Çıkarım API'si; Playground
Fiyatlandırma:$0.3000/second

Ana Özellikler:

Boyut Sınırı:Maks genişlik × yükseklik (özel)
LoRA Desteği:Hayır
Seed Seçenekleri:N/A

Bir Sonraki Şaheserinizi Yaratın

300+ Model ile Başlayın,

Sadece Atlas Cloud'da.