alibaba/wan-flf2v

Beeld-naar-Video

Open and Advanced Large-Scale Video Generative Models.

Wan FLF2V

Wan FLF2V (First-Last Frame Video Generation) is an open-source video generation model developed by the Alibaba Tongyi Wanxiang team. Its open-source license is Apache 2.0. Users only need to provide two images as the starting and ending frames, and the model automatically generates intermediate transition frames, outputting a logically coherent and naturally flowing 720p high-definition video.

Core Technical Highlights

Precise First-Last Frame Control: The matching rate of first and last frames reaches 98%, defining video boundaries through starting and ending scenes, intelligently filling intermediate dynamic changes to achieve scene transitions and object morphing effects.
Stable and Smooth Video Generation: Using CLIP semantic features and cross-attention mechanisms, the video jitter rate is reduced by 37% compared to similar models, ensuring natural and smooth transitions.
Multi-functional Creative Capabilities: Supports dynamic embedding of Chinese and English subtitles, generation of anime/realistic/fantasy and other styles, adapting to different creative needs.
720p HD Output: Directly generates 1280×720 resolution videos without post-processing, suitable for social media and commercial applications.
Open-source Ecosystem Support: Model weights, code, and training framework are fully open-sourced, supporting deployment on mainstream AI platforms.

Technical Principles and Architecture

DiT Architecture: Based on diffusion models and Diffusion Transformer architecture, combined with Full Attention mechanism to optimize spatiotemporal dependency modeling, ensuring video coherence.
3D Causal Variational Encoder: Wan-VAE technology compresses HD frames to 1/128 size while retaining subtle dynamic details, significantly reducing memory requirements.
Three-stage Training Strategy: Starting from 480P resolution pre-training, gradually upgrading to 720P, balancing generation quality and computational efficiency through phased optimization.

Gedetailleerde Specificaties

Overzicht:

Modelleverancier:QWEN

Modeltype:image-to-video

Implementatie:Inference API; Playground

Prijzen:$0.06

Belangrijkste Specificaties:

Groottelimiet:Max breedte × hoogte (aangepast)

LoRA-ondersteuning:Nee

Seed-opties:N/A

Creëer Uw Volgende Meesterwerk

Ontdek Vergelijkbare Modellen

NEW

HOT

Beeld-naar-Video

Wan-2.6 Image-to-video Flash

Wan2.6 image to video flash, faster and more cost-effective generation. Intelligent shot scheduling enables multi‑camera storytelling, supports stable multi‑speaker dialogue with more natural and realistic vocal timbres.

Wan-2.6 Video-to-video

A speed-optimized video-to-video option that prioritizes lower latency while retaining strong visual fidelity. Ideal for iteration, batch generation, and prompt testing.

Wan-2.6 Image-to-video

A speed-optimized image-to-video option that prioritizes lower latency while retaining strong visual fidelity. Ideal for iteration, batch generation, and prompt testing.

Wan-2.6 Text-to-video

A speed-optimized text-to-video option that prioritizes lower latency while retaining strong visual fidelity. Ideal for iteration, batch generation, and prompt testing.

$0.1/SEC

$0.07/SEC

-30%

Begin met 300+ Modellen,

Verken alle modellen