alibaba/wan-2.5/text-to-video

テキストから動画

Wan 2.5 Text-to-Video API by Alibaba

alibaba/wan-2.5/text-to-video

Text-to-video

A speed-optimized text-to-video option that prioritizes lower latency while retaining strong visual fidelity. Ideal for iteration, batch generation, and prompt testing.

入力

プロンプト *

ネガティブプロンプト

オーディオ

ファイルをドラッグ＆ドロップするか、クリックしてアップロード

MAX:1

サイズ

長さ

プロンプト拡張

オーディオ生成

シード

出力

待機中

生成された動画がここに表示されます

設定を構成して「実行」をクリックして開始

各実行には$0.035かかります。$10で約285回実行できます。

次にできること：

Seedance 2.0 Kling v3 Vidu Wan2.7

パラメータ

コード例
import requests
import time

# Step 1: Start video generation
generate_url = "https://api.atlascloud.ai/api/v1/model/generateVideo"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer $ATLASCLOUD_API_KEY"
}
data = {
    "model": "alibaba/wan-2.5/text-to-video",  # Required. model name
    "audio": "example_value",  # Audio URL to guide generation (optional)
    "duration": 5,  # The duration of the generated media in seconds. options: 5 | 10
    "enable_prompt_expansion": False,  # If set to true, the prompt optimizer will be enabled
    "negative_prompt": "example_value",  # Negative prompt for the generation
    "prompt": "A beautiful sunset over the ocean with gentle waves",  # Required. The prompt for generating the output
    "seed": -1,  # The random seed to use for the generation
    "size": "1920*1080",  # Required. The size of the generated media in pixels (width*height)
    "generate_audio": True,  # Whether to automatically add audio to the generated video
}

generate_response = requests.post(generate_url, headers=headers, json=data)
generate_result = generate_response.json()
prediction_id = generate_result["data"]["id"]

# Step 2: Poll for result
poll_url = f"https://api.atlascloud.ai/api/v1/model/prediction/{prediction_id}"

def check_status():
    while True:
        response = requests.get(poll_url, headers={"Authorization": "Bearer $ATLASCLOUD_API_KEY"})
        result = response.json()

        if result["data"]["status"] in ["completed", "succeeded"]:
            print("Generated video:", result["data"]["outputs"][0])
            return result["data"]["outputs"][0]
        elif result["data"]["status"] == "failed":
            raise Exception(result["data"]["error"] or "Generation failed")
        else:
            # Still processing, wait 2 seconds
            time.sleep(2)

video_url = check_status()

インストール

お使いの言語に必要なパッケージをインストールしてください。

pip install requests

認証

すべての API リクエストには API キーによる認証が必要です。API キーは Atlas Cloud ダッシュボードから取得できます。

export ATLASCLOUD_API_KEY="your-api-key-here"

HTTP ヘッダー

import os

API_KEY = os.environ.get("ATLASCLOUD_API_KEY")
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {API_KEY}"
}

API キーを安全に保管してください

API キーをクライアントサイドのコードや公開リポジトリに公開しないでください。代わりに環境変数またはバックエンドプロキシを使用してください。

リクエストを送信

import requests

url = "https://api.atlascloud.ai/api/v1/model/generateVideo"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer $ATLASCLOUD_API_KEY"
}
data = {
    "model": "your-model",
    "prompt": "A beautiful landscape"
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

リクエストを送信

非同期生成リクエストを送信します。API は予測 ID を返し、それを使用してステータスの確認や結果の取得ができます。

POST/api/v1/model/generateVideo

リクエストボディ

import requests

url = "https://api.atlascloud.ai/api/v1/model/generateVideo"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer $ATLASCLOUD_API_KEY"
}

data = {
    "model": "alibaba/wan-2.5/text-to-video",
    "prompt": "A beautiful sunset over the ocean with gentle waves"
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

print(f"Prediction ID: {result['data']['id']}")
print(f"Status: {result['data']['status']}")

レスポンス

{
  "code": 200,
  "data": {
    "id": "pred_abc123",
    "status": "processing",
    "model": "model-name",
    "created_at": "2025-01-01T00:00:00Z"
  }
}

ステータスを確認

予測エンドポイントをポーリングして、リクエストの現在のステータスを確認します。

GET/api/v1/model/prediction/{prediction_id}

ポーリング例

import requests
import time

prediction_id = "pred_abc123"
url = f"https://api.atlascloud.ai/api/v1/model/prediction/{prediction_id}"
headers = { "Authorization": "Bearer $ATLASCLOUD_API_KEY" }

while True:
    response = requests.get(url, headers=headers)
    result = response.json()
    status = result["data"]["status"]
    print(f"Status: {status}")

    if status in ["completed", "succeeded"]:
        output_url = result["data"]["outputs"][0]
        print(f"Output URL: {output_url}")
        break
    elif status == "failed":
        print(f"Error: {result['data'].get('error', 'Unknown')}")
        break

    time.sleep(3)

ステータス値

processingリクエストはまだ処理中です。

completed生成が完了しました。出力が利用可能です。

succeeded生成が成功しました。出力が利用可能です。

failed生成に失敗しました。エラーフィールドを確認してください。

完了レスポンス

{
  "data": {
    "id": "pred_abc123",
    "status": "completed",
    "outputs": [
      "https://storage.atlascloud.ai/outputs/result.mp4"
    ],
    "metrics": {
      "predict_time": 45.2
    },
    "created_at": "2025-01-01T00:00:00Z",
    "completed_at": "2025-01-01T00:00:10Z"
  }
}

ファイルをアップロード

Atlas Cloud ストレージにファイルをアップロードし、API リクエストで使用できる URL を取得します。multipart/form-data を使用してアップロードします。

POST/api/v1/model/uploadMedia

アップロード例

import requests

url = "https://api.atlascloud.ai/api/v1/model/uploadMedia"
headers = { "Authorization": "Bearer $ATLASCLOUD_API_KEY" }

with open("image.png", "rb") as f:
    files = {"file": ("image.png", f, "image/png")}
    response = requests.post(url, headers=headers, files=files)

result = response.json()
download_url = result["data"]["download_url"]
print(f"File URL: {download_url}")

レスポンス

{
  "data": {
    "download_url": "https://storage.atlascloud.ai/uploads/abc123/image.png",
    "file_name": "image.png",
    "content_type": "image/png",
    "size": 1024000
  }
}

入力 Schema

以下のパラメータがリクエストボディで使用できます。

合計: 9必須: 3任意: 6

modelstringrequired

model name

Default: "alibaba/wan-2.5/text-to-video"

audiostring

Audio URL to guide generation (optional).

durationinteger

The duration of the generated media in seconds.

Default: 5

510

enable_prompt_expansionboolean

If set to true, the prompt optimizer will be enabled.

Default: false

negative_promptstring

Negative prompt for the generation.

promptstringrequired

The prompt for generating the output.

seedinteger

The random seed to use for the generation. -1 means a random seed will be used.

Default: -1

sizestringrequired

The size of the generated media in pixels (width*height).

Default: "1920*1080"

832*480480*832624*6241280*720720*1280960*9601088*832832*10881920*10801080*19201440*14401632*12481248*1632

generate_audioboolean

Whether to automatically add audio to the generated video.

Default: true

リクエストボディの例

{
  "model": "alibaba/wan-2.5/text-to-video",
  "duration": 5,
  "enable_prompt_expansion": false,
  "prompt": "A beautiful landscape",
  "seed": -1,
  "size": "1920*1080",
  "generate_audio": true
}

出力 Schema

API は生成された出力 URL を含む予測レスポンスを返します。

created_atstring

ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”).

idstring

Unique identifier for the prediction, the ID of the prediction to get.

modelstring

Model ID used for the prediction.

outputsarray

Array of URLs to the generated content (empty when status is not completed).

statusstring

Status of the task: created, processing, completed, or failed.

レスポンス例

{
  "id": "pred_abc123",
  "status": "completed",
  "model": "model-name",
  "outputs": [
    "https://storage.atlascloud.ai/outputs/result.mp4"
  ],
  "metrics": {
    "predict_time": 45.2
  },
  "created_at": "2025-01-01T00:00:00Z",
  "completed_at": "2025-01-01T00:00:10Z"
}

Atlas Cloud Skills

Atlas Cloud Skills は 400 以上の AI モデルを AI コーディングアシスタントに直接統合します。ワンコマンドでインストールし、自然言語で画像・動画生成や LLM との対話が可能です。

対応クライアント

Claude Code

OpenAI Codex

Gemini CLI

Cursor

Windsurf

VS Code

Trae

GitHub Copilot

Cline

Roo Code

Amp

Goose

Replit

40+ 対応クライアント

インストール

npx skills add AtlasCloudAI/atlas-cloud-skills

API キーの設定

Atlas Cloud ダッシュボードから API キーを取得し、環境変数として設定してください。

export ATLASCLOUD_API_KEY="your-api-key-here"

機能

インストール後、AI アシスタントで自然言語を使用してすべての Atlas Cloud モデルにアクセスできます。

画像生成Nano Banana 2、Z-Image などのモデルで画像を生成します。

動画作成Kling、Vidu、Veo などでテキストや画像から動画を作成します。

LLM チャットQwen、DeepSeek などの大規模言語モデルと対話します。

メディアアップロード画像編集や画像から動画へのワークフロー用にローカルファイルをアップロードします。

詳細を見る

github.com/AtlasCloudAI/atlas-cloud-skills

MCP Server

Atlas Cloud MCP Server は Model Context Protocol を通じて IDE と 400 以上の AI モデルを接続します。MCP 対応のあらゆるクライアントで動作します。

対応クライアント

Cursor

VS Code

Windsurf

Claude Code

OpenAI Codex

Gemini CLI

Cline

Roo Code

100+ 対応クライアント

インストール

npx -y atlascloud-mcp

設定

以下の設定を IDE の MCP 設定ファイルに追加してください。

{
  "mcpServers": {
    "atlascloud": {
      "command": "npx",
      "args": [
        "-y",
        "atlascloud-mcp"
      ],
      "env": {
        "ATLASCLOUD_API_KEY": "your-api-key-here"
      }
    }
  }
}

利用可能なツール

atlas_generate_imageテキストプロンプトから画像を生成します。

atlas_generate_videoテキストや画像から動画を作成します。

atlas_chat大規模言語モデルと対話します。

atlas_list_models400 以上の利用可能な AI モデルを閲覧します。

atlas_quick_generate最適なモデルを自動選択し、ワンステップでコンテンツを作成。

atlas_upload_mediaAPI ワークフロー用にローカルファイルをアップロードします。

詳細を見る

github.com/AtlasCloudAI/mcp-server

APIスキーマ

{
  "info": {
    "title": "AtlasCloud API",
    "version": "1.0.0",
    "description": "The AtlasCloud API."
  },
  "openapi": "3.0.0",
  "paths": {
    "/api/v1/model/generateVideo": {
      "post": {
        "requestBody": {
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/Input"
              }
            }
          },
          "required": true
        },
        "responses": {
          "200": {
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/PredictionResponse"
                }
              }
            },
            "description": "The request status."
          }
        }
      },
      "x-api-name": "model_run"
    },
    "/api/v1/model/result/{request_id}": {
      "get": {
        "parameters": [
          {
            "in": "path",
            "name": "request_id",
            "required": true,
            "schema": {
              "description": "Request ID",
              "type": "string"
            }
          }
        ],
        "responses": {
          "200": {
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/PredictionResponse"
                }
              }
            },
            "description": "Result of the request."
          }
        }
      },
      "x-api-name": "model_result"
    }
  },
  "components": {
    "schemas": {
      "Input": {
        "properties": {
          "model": {
            "type": "string",
            "description": "model name",
            "default": "alibaba/wan-2.5/text-to-video"
          },
          "audio": {
            "description": "Audio URL to guide generation (optional).",
            "type": "string"
          },
          "duration": {
            "default": 5,
            "description": "The duration of the generated media in seconds.",
            "enum": [
              5,
              10
            ],
            "type": "integer",
            "x-ui-component": "select"
          },
          "enable_prompt_expansion": {
            "default": false,
            "description": "If set to true, the prompt optimizer will be enabled.",
            "type": "boolean"
          },
          "negative_prompt": {
            "description": "Negative prompt for the generation.",
            "type": "string"
          },
          "prompt": {
            "description": "The prompt for generating the output.",
            "type": "string",
            "x-rows": 10,
            "x-ui-component": "textarea"
          },
          "seed": {
            "default": -1,
            "description": "The random seed to use for the generation. -1 means a random seed will be used.",
            "type": "integer"
          },
          "size": {
            "default": "1920*1080",
            "description": "The size of the generated media in pixels (width*height).",
            "enum": [
              "832*480",
              "480*832",
              "624*624",
              "1280*720",
              "720*1280",
              "960*960",
              "1088*832",
              "832*1088",
              "1920*1080",
              "1080*1920",
              "1440*1440",
              "1632*1248",
              "1248*1632"
            ],
            "type": "string"
          },
          "generate_audio": {
            "default": true,
            "description": "Whether to automatically add audio to the generated video.",
            "type": "boolean"
          }
        },
        "required": [
          "model",
          "prompt",
          "size"
        ],
        "type": "object",
        "x-order-properties": [
          "model",
          "prompt",
          "negative_prompt",
          "audio",
          "size",
          "duration",
          "enable_prompt_expansion",
          "generate_audio",
          "seed"
        ]
      },
      "PredictionResponse": {
        "properties": {
          "created_at": {
            "description": "ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”).",
            "format": "date-time",
            "type": "string"
          },
          "has_nsfw_contents": {
            "description": "Array of boolean values indicating NSFW detection for each output.",
            "items": {
              "type": "boolean"
            },
            "type": "array"
          },
          "id": {
            "description": "Unique identifier for the prediction, the ID of the prediction to get.",
            "type": "string"
          },
          "model": {
            "description": "Model ID used for the prediction.",
            "type": "string"
          },
          "outputs": {
            "description": "Array of URLs to the generated content (empty when status is not completed).",
            "items": {
              "type": "object"
            },
            "type": "array"
          },
          "status": {
            "description": "Status of the task: created, processing, completed, or failed.",
            "type": "string"
          },
          "urls": {
            "description": "Object containing related API endpoints.",
            "type": "object"
          }
        },
        "type": "object"
      }
    }
  },
  "servers": [
    {
      "url": "https://api.atlascloud.ai"
    }
  ]
}

LLMフレンドリーなプロンプトテンプレート

# alibaba/wan-2.5/text-to-video

> A speed-optimized text-to-video option that prioritizes lower latency while retaining strong visual fidelity. Ideal for iteration, batch generation, and prompt testing.


## Overview

- **Submit endpoint (POST)**: `https://api.atlascloud.ai/api/v1/model/generateVideo` — start an async generation; returns a `prediction_id`
- **Poll endpoint (GET)**: `https://api.atlascloud.ai/api/v1/model/prediction/{prediction_id}` — poll this until the prediction finishes
- **Model ID**: `alibaba/wan-2.5/text-to-video`


## API Information

This model can be used via our HTTP API or more conveniently via our client libraries.
See the input and output schema below, as well as the usage examples.


### Input Schema

The API accepts the following input parameters:

- **`model`** (`string`, _required_):
  model name
  - Default: `"alibaba/wan-2.5/text-to-video"`

- **`prompt`** (`string`, _required_):
  The prompt for generating the output.

- **`negative_prompt`** (`string`, _optional_):
  Negative prompt for the generation.

- **`audio`** (`string`, _optional_):
  Audio URL to guide generation (optional).

- **`size`** (`string`, _required_):
  The size of the generated media in pixels (width*height).
  - Default: `"1920*1080"`
  - Options: "832*480", "480*832", "624*624", "1280*720", "720*1280", "960*960", "1088*832", "832*1088", "1920*1080", "1080*1920", "1440*1440", "1632*1248", "1248*1632"

- **`duration`** (`integer`, _optional_):
  The duration of the generated media in seconds.
  - Default: `5`
  - Options: 5, 10

- **`enable_prompt_expansion`** (`boolean`, _optional_):
  If set to true, the prompt optimizer will be enabled.
  - Default: `false`

- **`generate_audio`** (`boolean`, _optional_):
  Whether to automatically add audio to the generated video.
  - Default: `true`

- **`seed`** (`integer`, _optional_):
  The random seed to use for the generation. -1 means a random seed will be used.
  - Default: `-1`



**Required Parameters Example**:

```json
{
  "model": "alibaba/wan-2.5/text-to-video",
  "prompt": "",
  "size": "1920*1080"
}
```


**Full Example**:

```json
{
  "model": "alibaba/wan-2.5/text-to-video",
  "prompt": "",
  "negative_prompt": "",
  "audio": "",
  "size": "1920*1080",
  "duration": 5,
  "enable_prompt_expansion": false,
  "generate_audio": true,
  "seed": -1
}
```


### Output Schema

The API returns the following output format:


- **`created_at`** (`string`, _optional_):
  ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”).

- **`has_nsfw_contents`** (`array[boolean]`, _optional_):
  Array of boolean values indicating NSFW detection for each output.

- **`id`** (`string`, _optional_):
  Unique identifier for the prediction, the ID of the prediction to get.

- **`model`** (`string`, _optional_):
  Model ID used for the prediction.

- **`outputs`** (`array[object]`, _optional_):
  Array of URLs to the generated content (empty when status is not completed).

- **`status`** (`string`, _optional_):
  Status of the task: created, processing, completed, or failed.

- **`urls`** (`object`, _optional_):
  Object containing related API endpoints.



**Example Response**:

```json
{
  "created_at": "",
  "has_nsfw_contents": [],
  "id": "",
  "model": "",
  "outputs": [],
  "status": "",
  "urls": {}
}
```


## Usage Examples

### cURL

```bash
# Step 1: Start generation (async)
curl -X POST "https://api.atlascloud.ai/api/v1/model/generateVideo" \
  -H "Authorization: Bearer $ATLASCLOUD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "alibaba/wan-2.5/text-to-video",
  "prompt": "",
  "negative_prompt": "",
  "audio": "",
  "size": "1920*1080",
  "duration": 5,
  "enable_prompt_expansion": false,
  "generate_audio": true,
  "seed": -1
}'

# Response will contain: {"code": 200, "data": {"id": "prediction_id", "status": "processing"}}

# Step 2: Poll for result (replace {prediction_id} with the id returned above)
curl -X GET "https://api.atlascloud.ai/api/v1/model/prediction/{prediction_id}" \
  -H "Authorization: Bearer $ATLASCLOUD_API_KEY"

# Keep polling until status is "completed", "succeeded" or "failed"
# When completed, outputs will contain the generated content URL(s)
```

## Additional Resources

### Documentation

- [Model Playground](https://www.atlascloud.ai/models/alibaba/wan-2.5/text-to-video)

A middle-aged man sitting at a wooden desk in a cozy study room, surrounded by bookshelves and a warm lamp glow. He opens an old book and reads aloud with a calm, deep voice: 'History teaches us more than just facts… it shows us who we are.' The room has subtle background sounds: pages turning, the faint ticking of a clock, and distant rain against the window.

A young man in his early 30s sits in a modern studio, wearing a navy blazer and white shirt. Soft lighting illuminates his face. He speaks directly to the camera, his lips moving naturally as he says: “Welcome to today’s interview. We’re going to explore how AI is changing our daily lives.” His gestures are subtle, occasionally raising his hands for emphasis, creating a professional and engaging tone.

A cinematic opening sequence of a sci-fi movie: a spaceship travels across the galaxy, and the movie title “星河远征 · Galactic Odyssey” emerges in golden 3D letters, with flawless kerning and no distortion, floating stably in space as the camera rotates.

A handsome, muscular man with well-defined abs is catching his breath after an intense workout. Sweat drips down his torso. He is shirtless, wearing only black athletic shorts, and is leaning against gym equipment. The lighting comes from the upper side, highlighting the contours of his chest and arms. The scene is filled with a raw, masculine energy, hyper-realistic, high-contrast lighting.

A graceful ballerina with her hair in a messy bun, performing a powerful and emotional contemporary ballet routine. She is in a minimalist, dark art studio. Abstract patterns of light and shadow, projected from a hidden source, dance across her body and the surrounding walls, constantly shifting with her movements. The camera focuses on the tension in her muscles and the expressive gestures of her hands. A single, dramatic slow-motion shot captures her mid-air leap, with the light patterns swirling around her like a galaxy. Moody, artistic, high contrast.

A young couple sitting on a park bench during sunset. The woman leans her head on the man’s shoulder. He whispers softly: 'No matter where we go, I’ll always be here with you.' The sound includes the rustling of leaves, distant laughter of children playing, and the gentle hum of cicadas in the evening air.

A low-angle panning shot of a concrete wall under a highway overpass at night. Graffiti of a young man comes to life and starts rapping. The style is a dynamic blend of 2D street art animation on a realistic, dark, cinematic background. Cityscape is visible in the distance.

A 3D animated, anthropomorphic badger wearing a brown leather vest is angrily sweeping yellow autumn leaves from the doorway of his rustic wooden cabin. The style is reminiscent of a Pixar film, with detailed fur and expressive animation. Sunny day, lush green meadow with a forest in the background.

読み込み中...

Wan 2.5 を選ぶ理由

コストパフォーマンスに優れる

Google が最近価格を引き下げたにもかかわらず、Veo 3 は依然として高コストです。Wan 2.5 は軽量で費用対効果が高く、クリエイターに多くの選択肢を提供しながら制作コストを大幅に削減します。

ワンステップ生成、エンドツーエンド同期

Wan 2.5 では、別途音声録音や手動のリップ合わせは不要です。明確で構造化されたプロンプトを入力するだけで、音声・ナレーション・リップシンクを含む完全な動画を一度に生成できます。より速く、よりシンプルに。

マルチ言語対応

プロンプトが中国語の場合、Wan 2.5 は音声・映像が同期した動画を確実に生成します。一方、Veo 3 は中国語プロンプトに対して「不明な言語」と表示されることが多いです。

キャラクターの精密な再現

Wan 2.5 はキャラクター特性の再現に優れており、外見・表情・動作スタイルを正確に表現します。生成された動画のキャラクターをより認識しやすく個性的にし、ストーリーテリングと没入感を高めます。

アーティスティックスタイルレンダリング

Studio Ghibli スタイルのレンダリングに対応し、手描き水彩テクスチャとアニメーション効果を実現。温かく幻想的なビジュアル体験をもたらし、芸術的な魅力と物語の深みを高めます。

誰に役立つのか？

マーケティングチーム

製品ローンチ、プロモーションキャンペーン、ブランドマーケティングなど、Wan 2.5 は高品質な動画を素早く生成し、制作をシンプルかつ効率的にします。

調整の手間なしに製品デモやチュートリアルを作成
マルチ言語字幕とリップシンクによるSNSマーケティング
AI生成コンテンツでチームが戦略とクリエイティビティに集中できる

Bottom line: 結論：制作がこれほど簡単・高速・スマートになったことはありません。Wan 2.5 はマーケティングの秘密兵器です！

グローバル企業

多国籍企業に理想的なコンテンツローカライゼーションソリューションを提供し、制作をより手軽かつ効率的にします。

プロンプト認識による多言語動画サポート
リップシンク字幕とナレーションのワンクリック生成
グローバル市場向けコンテンツの迅速なローカライゼーション

Bottom line: 結論：クロスボーダーのコンテンツ制作が、これほど簡単・高速・スマートになったことはありません。

ストーリークリエイター / YouTuber

クリエイターは Wan 2.5 を活用して動画制作の効率を高めながら、高品質な出力を実現できます。

精密なキャラクターの動作と表情による没入型ストーリーテリング
編集・ポストプロダクション時間の削減による投稿効率の向上
ショート動画からアニメーションストーリーまで多彩なコンテンツ

企業研修チーム

Wan 2.5 で企業研修をより効率的で魅力的なものにします。

プロフェッショナルな動画が退屈なテキスト資料を代替
操作デモや研修チュートリアルを素早く作成
グローバル展開に向けた一貫したスタイルと標準化された出力

フリーランスクリエイター / 小規模スタジオ

Wan 2.5 は高価な機材や俳優なしでクリエイティビティを解き放ちます。AIがすべてを効率的に生成します。

短編映画からSNSコンテンツまで多様な作品に挑戦
アイデアから完成まで「ワンクリック生成」
高価な機材やプロの俳優なしに高品質コンテンツを実現

Bottom line: 結論：Wan 2.5 で制作がより手軽に、自由に、そして刺激的に。毎回の挑戦が驚きをもたらします！

教育機関 / オンラインコース制作者

高コストなしにクリエイティビティを現実に。Wan 2.5 で高品質なコンテンツ制作が手軽で経済的になります。

短編映画からプロモーション動画まで様々なスタイルに挑戦
コンセプトから完成品まで制作効率が向上
高価な機材やプロの人材なしに高品質コンテンツを実現

Bottom line: 結論：Wan 2.5 で制作が楽に、効率的に、自由に。毎回の挑戦が輝かしい結果をもたらします！

コア機能

ワンステップ音声・映像生成

同期した音声・ナレーション・リップシンクを含む完全な動画をワンプロセスで生成

デュアルキャラクター同期

2キャラクターを同時生成し、動作・表情・リップシンクを同期させた自然なインタラクションを実現

プロフェッショナルクオリティ

リアルなキャラクター表情と精密なリップシンクによる高品質動画出力

多言語サポート

中国語プロンプトへの優れた対応と多言語コンテンツの安定した生成

高いコストパフォーマンス

プロフェッショナルクオリティを維持しながら、競合と比べてコストを大幅に削減

キャラクター特性の再現

高い忠実度と個性を持ってキャラクターの外見・表情・動作スタイルを精密に再現

アーティスティックスタイルレンダリング

Studio Ghibli 風の手描き水彩テクスチャを含む様々なアーティスティックスタイルに対応

没入型シーン

対話シーン、インタビュー、デュオ短編映画に最適。自然な音声・映像の一貫性を実現

Digital Human Sync

Study Room Scholar

Middle-aged man reading with perfect lip-sync in a warm study environment

Lip-sync with audioEnvironmental soundsCharacter emotion

Prompt

Dual Character Scene

Park Sunset Romance

Couple interaction with synchronized dual character actions and expressions

Dual character syncNatural interactionAmbient soundscape

Prompt

A young couple sitting on a park bench during sunset. The woman leans her head on the man's shoulder. He whispers softly: 'No matter where we go, I'll always be here with you.' The sound includes the rustling of leaves, distant laughter of children playing, and the gentle hum of cicadas in the evening air.

Character Restoration

Ballet Performance Art

Precise character trait restoration with artistic movement and expression

Character trait restorationMovement precisionArtistic lighting

Prompt

Artistic Style Rendering

Ghibli Forest Magic

Studio Ghibli-inspired animation with hand-painted watercolor texture

Ghibli art styleHand-painted textureMagical atmosphere

Prompt

Studio Ghibli-inspired anime style. A young girl with a straw hat lies peacefully in a sun-dappled magical forest, surrounded by friendly, glowing forest spirits (Kodama). A gentle breeze rustles the leaves of the giant, ancient trees. The air is filled with sparkling dust motes, illuminated by shafts of sunlight. The art style is soft, with a hand-painted watercolor texture. The scene feels serene, magical, and heartwarming.

最適な用途

🎬

動画制作

📢

マーケティングコンテンツ

🎓

教育動画

📱

ソーシャルメディア

🌐

多言語コンテンツ

💼

企業研修

🎭

エンターテインメント

💃

パフォーミングアーツ

🎨

アニメーション＆アニメ

📚

ストーリーテリング

👥

デュアルキャラクター動画

🎙️

インタビュー

📺

放送・メディア

技術仕様

モデルタイプ：音声・映像同期生成

主な特徴：音声・映像同期、キャラクター再現、アーティスティックレンダリング、多言語

言語サポート：中国語、英語、その他

出力品質：音声付きプロフェッショナルHD動画

生成速度：高速ワンステップ生成

API 連携：包括的なドキュメント付き RESTful API

Wan 2.5 を体験 - あなたの映像制作に革命を

数千人のクリエイターと企業の仲間入りをして、音声・映像同期生成技術で動画コンテンツ制作を変革しましょう。

🎬ワンステップ音声・映像同期

🌍多言語サポート

⚡高いコストパフォーマンス

Wan 2.5: A next-generation AI video generation model developed by Alibaba Wanxiang.

Model Card Overview

Field	Description
Model Name	Wan 2.5
Developed By	Alibaba Group
Release Date	September 24, 2025
Model Type	Generative AI, Video Foundation Model
Related Links	Official Website: https://wan.video/, Hugging Face: https://huggingface.co/Wan-AI, Technical Paper (Wan Series): https://arxiv.org/abs/2503.20314

Introduction

Wan 2.5 is a state-of-the-art, open-source video foundation model developed by Alibaba's Wan AI team. It is designed to generate high-quality, cinematic videos complete with synchronized audio directly from text or image prompts. The model represents a significant advancement in the field of generative AI, aiming to lower the barrier for creative video production. Its core contribution lies in its ability to produce coherent, dynamic, and narratively consistent video clips with a high degree of realism and integrated audio-visual elements, such as lip-sync and sound effects, in a single, streamlined process.

Key Features & Innovations

Wan 2.5 introduces several key features that distinguish it from previous models and competitors:

Unified Audio-Visual Synthesis: Unlike many models that require separate steps for video and audio generation, Wan 2.5 creates video with natively synchronized audio, including voice, sound effects, and lip-sync, in one step.
High-Fidelity, High-Resolution Output: The model is capable of generating videos in multiple resolutions, including 480p, 720p, and full 1080p HD, with significant improvements in visual quality and frame-to-frame stability over its predecessors.
Extended Video Duration: Wan 2.5 can generate video clips up to 10 seconds in length, offering more creative flexibility for storytelling compared to other models in its class.
Advanced Cinematic Control: The model demonstrates a sophisticated understanding of cinematic language, allowing for precise control over camera movement, shot composition, and character consistency within scenes.
Open-Source Commitment: Following the precedent set by earlier versions, the Wan series of models, including Wan 2.5, are open-sourced to encourage research, development, and innovation within the broader AI community.

Model Architecture & Technical Details

Wan 2.5 is built upon the Diffusion Transformer (DiT) paradigm, which has become a mainstream approach for high-quality generative tasks. The technical report for the Wan model series outlines a suite of innovations that contribute to its performance.

The architecture includes a novel Variational Autoencoder (VAE) designed for high-efficiency video compression, enabling the model to handle high-resolution video data effectively. The Wan series is available in multiple sizes to balance performance and computational requirements, such as the 1.3B and 14B parameter models detailed for Wan 2.2. The model was trained on a massive, curated dataset comprising billions of images and videos, which enhances its ability to generalize across a wide range of motions, semantics, and aesthetic styles.

Intended Use & Applications

Wan 2.5 is designed for a wide array of applications in creative and commercial fields. Its intended uses include:

Content Creation: Generating short-form videos for social media, marketing campaigns, and digital advertising.
Storytelling and Filmmaking: Creating cinematic scenes, character animations, and narrative sequences for short films and conceptual art.
Prototyping: Rapidly visualizing scripts and storyboards for film, television, and game development.
Personalized Media: Enabling users to create unique, personalized video content from their own ideas and images.

Performance

Wan 2.5 has demonstrated significant performance improvements over previous versions and holds a competitive position against other leading video generation models. Independent reviews and benchmarks provide insight into its capabilities.

Benchmark Scores

A review conducted by Curious Refuge Labs™ evaluated the model's visual generation capabilities across several metrics.

Metric	Score (out of 10)
Prompt Adherence	7.0
Temporal Consistency	6.6
Visual Fidelity	6.5
Motion Quality	5.9
Style & Cinematic Realism	5.7
Overall Score	6.3

These scores indicate strong prompt understanding and a notable improvement in visual quality from Wan 2.2, although it still shows limitations in complex motion and realism compared to top-tier commercial models.

類似モデルを探索

NEW

HOT

テキストから動画

Van-2.5 Text-to-video

Convert prompts into cinematic video clips with synchronized sound. Van 2.5 generates 720p/1080p outputs with stable motion, native audio sync, and prompt-faithful visual storytelling.

Van-2.5 Image-to-video

Get animated visuals from your images faster without major quality sacrifice. Perfect for preview workflows, previews at scale, or mass production of animated assets.

HappyHorse-1.1 Reference-to-video

Generates videos from one to nine reference images and a text prompt, supporting 720P or 1080P output, flexible aspect ratios, and durations from 3 to 15 seconds.

HappyHorse-1.1 Image-to-video

Animates a first-frame image into video with optional prompt guidance, 720P or 1080P output, and durations from 3 to 15 seconds.

HappyHorse-1.1 Text-to-video

Generates videos from text prompts with HappyHorse 1.1, supporting 720P or 1080P output, flexible aspect ratios, and durations from 3 to 15 seconds.

HappyHorse-1.0 Image-to-video

Animates a first-frame image into video with optional prompt guidance, 720P or 1080P output, and durations from 3 to 15 seconds.

HappyHorse-1.0 Text-to-video

Generates videos from text prompts with HappyHorse 1.0, supporting 720P or 1080P output, flexible aspect ratios, and durations from 3 to 15 seconds.

HappyHorse-1.0 Video-edit

Edits an input video with text instructions and optional reference images, supporting 720P or 1080P output.

HappyHorse-1.0 Reference-to-video

Generates videos from one to nine reference images and a text prompt, supporting 720P or 1080P output, flexible aspect ratios, and durations from 3 to 15 seconds.

From

$0.14/秒