
Seedance 2.0 Reference-to-Video API by ByteDance
Multimodal video generation from reference images, videos, and audio. Supports video editing and extension.
输入
输出
空闲生成的 720p 视频每秒将按 $0.2419/秒 计费。每 1000 tokens 的费用为 $0.0112。tokens 数量按以下公式计算:(输出视频高度 × 输出视频宽度 ×(输入时长 + 输出时长)× 24)/ 1024。若提供视频输入,每 1000 tokens 的费率将降至 $0.00688。在视频输入加 720p 分辨率下,单价为每秒 $0.1486。
代码示例
import requests
import time
# Step 1: Start video generation
generate_url = "https://api.atlascloud.ai/api/v1/model/generateVideo"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer $ATLASCLOUD_API_KEY"
}
data = {
"model": "bytedance/seedance-2.0/reference-to-video",
"prompt": "A beautiful sunset over the ocean with gentle waves",
"width": 512,
"height": 512,
"duration": 3,
"fps": 24,
}
generate_response = requests.post(generate_url, headers=headers, json=data)
generate_result = generate_response.json()
prediction_id = generate_result["data"]["id"]
# Step 2: Poll for result
poll_url = f"https://api.atlascloud.ai/api/v1/model/prediction/{prediction_id}"
def check_status():
while True:
response = requests.get(poll_url, headers={"Authorization": "Bearer $ATLASCLOUD_API_KEY"})
result = response.json()
if result["data"]["status"] in ["completed", "succeeded"]:
print("Generated video:", result["data"]["outputs"][0])
return result["data"]["outputs"][0]
elif result["data"]["status"] == "failed":
raise Exception(result["data"]["error"] or "Generation failed")
else:
# Still processing, wait 2 seconds
time.sleep(2)
video_url = check_status()安装
安装所需的依赖包。
pip install requests认证
所有 API 请求需要通过 API Key 进行认证。您可以在 Atlas Cloud 控制台获取 API Key。
export ATLASCLOUD_API_KEY="your-api-key-here"HTTP 请求头
import os
API_KEY = os.environ.get("ATLASCLOUD_API_KEY")
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}"
}切勿在客户端代码或公开仓库中暴露您的 API Key。请使用环境变量或后端代理。
提交请求
import requests
url = "https://api.atlascloud.ai/api/v1/model/generateVideo"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer $ATLASCLOUD_API_KEY"
}
data = {
"model": "your-model",
"prompt": "A beautiful landscape"
}
response = requests.post(url, headers=headers, json=data)
print(response.json())提交请求
提交一个异步生成请求。API 返回一个 prediction ID,您可以用它来检查状态和获取结果。
/api/v1/model/generateVideo请求体
import requests
url = "https://api.atlascloud.ai/api/v1/model/generateVideo"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer $ATLASCLOUD_API_KEY"
}
data = {
"model": "bytedance/seedance-2.0/reference-to-video",
"input": {
"prompt": "A beautiful sunset over the ocean with gentle waves"
}
}
response = requests.post(url, headers=headers, json=data)
result = response.json()
print(f"Prediction ID: {result['id']}")
print(f"Status: {result['status']}")响应
{
"id": "pred_abc123",
"status": "processing",
"model": "model-name",
"created_at": "2025-01-01T00:00:00Z"
}检查状态
轮询 prediction 端点以检查请求的当前状态。
/api/v1/model/prediction/{prediction_id}轮询示例
import requests
import time
prediction_id = "pred_abc123"
url = f"https://api.atlascloud.ai/api/v1/model/prediction/{prediction_id}"
headers = { "Authorization": "Bearer $ATLASCLOUD_API_KEY" }
while True:
response = requests.get(url, headers=headers)
result = response.json()
status = result["data"]["status"]
print(f"Status: {status}")
if status in ["completed", "succeeded"]:
output_url = result["data"]["outputs"][0]
print(f"Output URL: {output_url}")
break
elif status == "failed":
print(f"Error: {result['data'].get('error', 'Unknown')}")
break
time.sleep(3)状态值
processing请求仍在处理中。completed生成完成,输出可用。succeeded生成成功,输出可用。failed生成失败,请检查 error 字段。完成响应
{
"data": {
"id": "pred_abc123",
"status": "completed",
"outputs": [
"https://storage.atlascloud.ai/outputs/result.mp4"
],
"metrics": {
"predict_time": 45.2
},
"created_at": "2025-01-01T00:00:00Z",
"completed_at": "2025-01-01T00:00:10Z"
}
}上传文件
将文件上传到 Atlas Cloud 存储,获取可在 API 请求中使用的 URL。使用 multipart/form-data 上传。
/api/v1/model/uploadMedia上传示例
import requests
url = "https://api.atlascloud.ai/api/v1/model/uploadMedia"
headers = { "Authorization": "Bearer $ATLASCLOUD_API_KEY" }
with open("image.png", "rb") as f:
files = {"file": ("image.png", f, "image/png")}
response = requests.post(url, headers=headers, files=files)
result = response.json()
download_url = result["data"]["download_url"]
print(f"File URL: {download_url}")响应
{
"data": {
"download_url": "https://storage.atlascloud.ai/uploads/abc123/image.png",
"file_name": "image.png",
"content_type": "image/png",
"size": 1024000
}
}Input Schema
以下参数在请求体中被接受。
暂无可用参数。
请求体示例
{
"model": "bytedance/seedance-2.0/reference-to-video"
}Output Schema
API 返回包含生成输出 URL 的 prediction 响应。
响应示例
{
"id": "pred_abc123",
"status": "completed",
"model": "model-name",
"outputs": [
"https://storage.atlascloud.ai/outputs/result.mp4"
],
"metrics": {
"predict_time": 45.2
},
"created_at": "2025-01-01T00:00:00Z",
"completed_at": "2025-01-01T00:00:10Z"
}Atlas Cloud Skills
Atlas Cloud Skills 将 300+ AI 模型直接集成到您的 AI 编程助手中。一条命令安装,即可用自然语言生成图像、视频和与 LLM 对话。
支持的客户端
安装
npx skills add AtlasCloudAI/atlas-cloud-skills设置 API Key
从 Atlas Cloud 控制台获取 API Key,并将其设置为环境变量。
export ATLASCLOUD_API_KEY="your-api-key-here"功能
安装后,您可以在 AI 助手中使用自然语言访问所有 Atlas Cloud 模型。
MCP Server
Atlas Cloud MCP Server 通过 Model Context Protocol 将您的 IDE 与 300+ AI 模型连接。支持任何兼容 MCP 的客户端。
支持的客户端
安装
npx -y atlascloud-mcp配置
将以下配置添加到您的 IDE 的 MCP 设置文件中。
{
"mcpServers": {
"atlascloud": {
"command": "npx",
"args": [
"-y",
"atlascloud-mcp"
],
"env": {
"ATLASCLOUD_API_KEY": "your-api-key-here"
}
}
}
}可用工具
API Schema
Schema 不可用暂无可用示例
1. Introduction
Seedance 2.0 is a state-of-the-art multimodal generative AI model designed for synchronized video and audio content creation. Developed by ByteDance and integrated into the CapCut/Dreamina platform as of March 2026, this model family advances the field of generative multimedia by combining sophisticated diffusion transformer architectures with physics-informed world modeling for realistic motion and spatial consistency.
Seedance 2.0’s significance lies in its Dual-Branch Diffusion Transformer (DB-DiT) architecture that jointly processes video and audio streams, enabling phoneme-level lip synchronization across multiple languages. Compared to previous iterations, it achieves substantially higher output usability rates and faster generation speeds. The two variants target different workloads: Seedance 2.0 delivers high-fidelity, cinematic-quality renders with enhanced lighting and texture detail, while Seedance 2.0 Fast provides a cost-effective, accelerated pipeline optimized for high throughput and rapid prototyping.
2. Key Features & Innovations
-
Dual-Branch Diffusion Transformer Architecture: Seedance 2.0 integrates separate yet synchronized diffusion branches for video and audio, enabling tight coupling between visual motion and sound generation. This architecture improves motion realism and audio-visual coherence beyond previous generative models.
-
World Model with Physics Simulation: The model incorporates a physics-based world modeling approach that simulates realistic object motion and spatial consistency over time. This leads to naturalistic dynamics and stable scene composition across generated video sequences.
-
Rich Multimodal Input Support: Seedance 2.0 accepts diverse input formats including text prompts, up to 9 images, and up to 3 video or audio clips of 15 seconds each. This flexibility allows nuanced content creation workflows combining static, dynamic, and auditory cues.
-
Phoneme-Level Lip Synchronization: The native audio generation pipeline supports lip-sync at the phoneme granularity in 8+ languages, ensuring high fidelity mouth movements closely match generated speech or singing.
-
High Usability and Efficiency: The model achieves an estimated 90% usable output rate compared to an industry average of approximately 20%, reducing post-processing overhead. Additionally, it delivers a 30% inference speed advantage over predecessor systems.
-
API Variants for Different Use Cases: The Seedance 2.0 endpoint is geared toward high fidelity and cinematic visual effects suitable for final production, while the Seedance 2.0 Fast variant offers roughly 3 times faster generation and approximately 91% cost savings at $0.022 per second of output, ideal for rapid iteration and volume workflows.
3. Model Architecture & Technical Details
Seedance 2.0 is built around the Dual-Branch Diffusion Transformer (DB-DiT), which separately processes video and audio streams via transformer-based denoising diffusion models while synchronizing generation steps to enforce audio-visual alignment. The system leverages a World Model that integrates physics simulation modules, enabling consistent spatial and temporal object behaviors within video sequences.
Training was conducted in multiple stages on large-scale, diverse datasets spanning images, videos, text captions, and audio recordings across multiple languages. Initial large-scale pre-training utilized resolutions spanning from 720p to 1080p, followed by supervised fine-tuning (SFT) to improve text and visual prompt conditioning fidelity. Reinforcement Learning with Human Feedback (RLHF) optimized multi-dimensional reward models that simultaneously assess aesthetics, motion coherence, and audio-visual synchronization quality.
The training pipeline supports multiple aspect ratios including 9:16, 16:9, 1:1, and 4:3, and target output lengths from 4 to 60 seconds. Specialized modules enable the @ reference system for fine-grained control of creative elements based on provided input assets.
4. Performance Highlights
Seedance 2.0 was benchmarked on the comprehensive SeedVideoBench-2.0 suite, which evaluates generative video models across over 50 image-based and 24 video-based benchmarks covering diverse content domains and multi-modal tasks.
| Rank | Model | Developer | Score/Metric | Release Date |
|---|---|---|---|---|
| 1 | Kling 3.0 | External | Competitive | 2025 |
| 2 | Sora 2 | External | Competitive | 2025 |
| 3 | Seedance 2.0 | ByteDance | High audiovisual sync, motion realism | 2026 |
| 4 | Veo 3.1 | External | Strong baseline | 2025 |
Seedance 2.0 matches or exceeds these contemporary models in synchronized video-audio generation, demonstrating especially strong performance in phoneme-level lip synchronization and motion naturalism thanks to the World Model component. Its 30% speed improvement and 90% output usability rate reflect notable efficiency advancements.
5. Intended Use & Applications
-
Social Media Content Creation: Efficiently generate engaging short videos with synchronized audio and visually rich effects, tailored for platforms like TikTok and Instagram.
-
E-commerce Product Videos: Automatically produce dynamic product showcases combining text, image, and video inputs with realistic motion and sound to enhance online shopping experiences.
-
Marketing Campaigns: Craft high-quality cinematic promotional content that integrates brand assets via the @ reference system for tailored storytelling and audience engagement.
-
Music Videos: Generate synchronized visuals with phoneme-accurate lip-syncing for multilingual vocal tracks to support artist and record label promotional needs.
-
Short Narrative Films: Create compelling narrative-driven video clips with coherent motion and spatial consistency, supporting indie filmmakers and content creators.
-
Fashion and Luxury Showcases: Produce visually detailed and aesthetic presentations incorporating texture and lighting refinements for high-end brand communications.






