
Seedance 2.0 Reference-to-Video API by ByteDance
Multimodal video generation from reference images, videos, and audio. Supports video editing and extension.
Entrée
Sortie
InactifChaque seconde de vidéo 720p générée vous est facturée $0.2419/seconde. La requête coûte $0.0112 par tranche de 1000 tokens. Le nombre de tokens se calcule ainsi : (hauteur de la vidéo de sortie × largeur de la vidéo de sortie ×(durée d'entrée + durée de sortie)× 24) / 1024. Avec une entrée vidéo, le tarif passe à $0.00688 par 1000 tokens. Avec une entrée vidéo et une résolution 720p, le prix est de $0.1486 par seconde.
Vous pouvez continuer avec :
Exemple de code
import requests
import time
# Step 1: Start video generation
generate_url = "https://api.atlascloud.ai/api/v1/model/generateVideo"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer $ATLASCLOUD_API_KEY"
}
data = {
"model": "bytedance/seedance-2.0/reference-to-video",
"prompt": "A beautiful sunset over the ocean with gentle waves",
"width": 512,
"height": 512,
"duration": 3,
"fps": 24,
}
generate_response = requests.post(generate_url, headers=headers, json=data)
generate_result = generate_response.json()
prediction_id = generate_result["data"]["id"]
# Step 2: Poll for result
poll_url = f"https://api.atlascloud.ai/api/v1/model/prediction/{prediction_id}"
def check_status():
while True:
response = requests.get(poll_url, headers={"Authorization": "Bearer $ATLASCLOUD_API_KEY"})
result = response.json()
if result["data"]["status"] in ["completed", "succeeded"]:
print("Generated video:", result["data"]["outputs"][0])
return result["data"]["outputs"][0]
elif result["data"]["status"] == "failed":
raise Exception(result["data"]["error"] or "Generation failed")
else:
# Still processing, wait 2 seconds
time.sleep(2)
video_url = check_status()Installer
Installez le package requis pour votre langage.
pip install requestsAuthentification
Toutes les requêtes API nécessitent une authentification via une clé API. Vous pouvez obtenir votre clé API depuis le tableau de bord Atlas Cloud.
export ATLASCLOUD_API_KEY="your-api-key-here"En-têtes HTTP
import os
API_KEY = os.environ.get("ATLASCLOUD_API_KEY")
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}"
}N'exposez jamais votre clé API dans du code côté client ou dans des dépôts publics. Utilisez plutôt des variables d'environnement ou un proxy backend.
Soumettre une requête
import requests
url = "https://api.atlascloud.ai/api/v1/model/generateVideo"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer $ATLASCLOUD_API_KEY"
}
data = {
"model": "your-model",
"prompt": "A beautiful landscape"
}
response = requests.post(url, headers=headers, json=data)
print(response.json())Soumettre une requête
Soumettez une requête de génération asynchrone. L'API renvoie un identifiant de prédiction que vous pouvez utiliser pour vérifier le statut et récupérer le résultat.
/api/v1/model/generateVideoCorps de la requête
import requests
url = "https://api.atlascloud.ai/api/v1/model/generateVideo"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer $ATLASCLOUD_API_KEY"
}
data = {
"model": "bytedance/seedance-2.0/reference-to-video",
"input": {
"prompt": "A beautiful sunset over the ocean with gentle waves"
}
}
response = requests.post(url, headers=headers, json=data)
result = response.json()
print(f"Prediction ID: {result['id']}")
print(f"Status: {result['status']}")Réponse
{
"id": "pred_abc123",
"status": "processing",
"model": "model-name",
"created_at": "2025-01-01T00:00:00Z"
}Vérifier le statut
Interrogez le point de terminaison de prédiction pour vérifier le statut actuel de votre requête.
/api/v1/model/prediction/{prediction_id}Exemple d'interrogation
import requests
import time
prediction_id = "pred_abc123"
url = f"https://api.atlascloud.ai/api/v1/model/prediction/{prediction_id}"
headers = { "Authorization": "Bearer $ATLASCLOUD_API_KEY" }
while True:
response = requests.get(url, headers=headers)
result = response.json()
status = result["data"]["status"]
print(f"Status: {status}")
if status in ["completed", "succeeded"]:
output_url = result["data"]["outputs"][0]
print(f"Output URL: {output_url}")
break
elif status == "failed":
print(f"Error: {result['data'].get('error', 'Unknown')}")
break
time.sleep(3)Valeurs de statut
processingLa requête est encore en cours de traitement.completedLa génération est terminée. Les résultats sont disponibles.succeededLa génération a réussi. Les résultats sont disponibles.failedLa génération a échoué. Vérifiez le champ d'erreur.Réponse terminée
{
"data": {
"id": "pred_abc123",
"status": "completed",
"outputs": [
"https://storage.atlascloud.ai/outputs/result.mp4"
],
"metrics": {
"predict_time": 45.2
},
"created_at": "2025-01-01T00:00:00Z",
"completed_at": "2025-01-01T00:00:10Z"
}
}Télécharger des fichiers
Téléchargez des fichiers vers le stockage Atlas Cloud et obtenez une URL utilisable dans vos requêtes API. Utilisez multipart/form-data pour le téléchargement.
/api/v1/model/uploadMediaExemple de téléchargement
import requests
url = "https://api.atlascloud.ai/api/v1/model/uploadMedia"
headers = { "Authorization": "Bearer $ATLASCLOUD_API_KEY" }
with open("image.png", "rb") as f:
files = {"file": ("image.png", f, "image/png")}
response = requests.post(url, headers=headers, files=files)
result = response.json()
download_url = result["data"]["download_url"]
print(f"File URL: {download_url}")Réponse
{
"data": {
"download_url": "https://storage.atlascloud.ai/uploads/abc123/image.png",
"file_name": "image.png",
"content_type": "image/png",
"size": 1024000
}
}Schema d'entrée
Les paramètres suivants sont acceptés dans le corps de la requête.
Aucun paramètre disponible.
Exemple de corps de requête
{
"model": "bytedance/seedance-2.0/reference-to-video"
}Schema de sortie
L'API renvoie une réponse de prédiction avec les URL des résultats générés.
Exemple de réponse
{
"id": "pred_abc123",
"status": "completed",
"model": "model-name",
"outputs": [
"https://storage.atlascloud.ai/outputs/result.mp4"
],
"metrics": {
"predict_time": 45.2
},
"created_at": "2025-01-01T00:00:00Z",
"completed_at": "2025-01-01T00:00:10Z"
}Atlas Cloud Skills
Atlas Cloud Skills intègre plus de 300 modèles d'IA directement dans votre assistant de codage IA. Une seule commande pour installer, puis utilisez le langage naturel pour générer des images, des vidéos et discuter avec des LLM.
Clients pris en charge
Installer
npx skills add AtlasCloudAI/atlas-cloud-skillsConfigurer la clé API
Obtenez votre clé API depuis le tableau de bord Atlas Cloud et définissez-la comme variable d'environnement.
export ATLASCLOUD_API_KEY="your-api-key-here"Fonctionnalités
Une fois installé, vous pouvez utiliser le langage naturel dans votre assistant IA pour accéder à tous les modèles Atlas Cloud.
Serveur MCP
Le serveur MCP Atlas Cloud connecte votre IDE avec plus de 300 modèles d'IA via le Model Context Protocol. Compatible avec tout client compatible MCP.
Clients pris en charge
Installer
npx -y atlascloud-mcpConfiguration
Ajoutez la configuration suivante au fichier de paramètres MCP de votre IDE.
{
"mcpServers": {
"atlascloud": {
"command": "npx",
"args": [
"-y",
"atlascloud-mcp"
],
"env": {
"ATLASCLOUD_API_KEY": "your-api-key-here"
}
}
}
}Outils disponibles
Schéma API
Schéma non disponibleAucun exemple disponible
Veuillez vous connecter pour voir l'historique des requêtes
Vous devez vous connecter pour accéder à l'historique de vos requêtes de modèle.
Se Connecter1. Introduction
Seedance 2.0 is a state-of-the-art multimodal generative AI model designed for synchronized video and audio content creation. Developed by ByteDance and integrated into the CapCut/Dreamina platform as of March 2026, this model family advances the field of generative multimedia by combining sophisticated diffusion transformer architectures with physics-informed world modeling for realistic motion and spatial consistency.
Seedance 2.0’s significance lies in its Dual-Branch Diffusion Transformer (DB-DiT) architecture that jointly processes video and audio streams, enabling phoneme-level lip synchronization across multiple languages. Compared to previous iterations, it achieves substantially higher output usability rates and faster generation speeds. The two variants target different workloads: Seedance 2.0 delivers high-fidelity, cinematic-quality renders with enhanced lighting and texture detail, while Seedance 2.0 Fast provides a cost-effective, accelerated pipeline optimized for high throughput and rapid prototyping.
2. Key Features & Innovations
-
Dual-Branch Diffusion Transformer Architecture: Seedance 2.0 integrates separate yet synchronized diffusion branches for video and audio, enabling tight coupling between visual motion and sound generation. This architecture improves motion realism and audio-visual coherence beyond previous generative models.
-
World Model with Physics Simulation: The model incorporates a physics-based world modeling approach that simulates realistic object motion and spatial consistency over time. This leads to naturalistic dynamics and stable scene composition across generated video sequences.
-
Rich Multimodal Input Support: Seedance 2.0 accepts diverse input formats including text prompts, up to 9 images, and up to 3 video or audio clips of 15 seconds each. This flexibility allows nuanced content creation workflows combining static, dynamic, and auditory cues.
-
Phoneme-Level Lip Synchronization: The native audio generation pipeline supports lip-sync at the phoneme granularity in 8+ languages, ensuring high fidelity mouth movements closely match generated speech or singing.
-
High Usability and Efficiency: The model achieves an estimated 90% usable output rate compared to an industry average of approximately 20%, reducing post-processing overhead. Additionally, it delivers a 30% inference speed advantage over predecessor systems.
-
API Variants for Different Use Cases: The Seedance 2.0 endpoint is geared toward high fidelity and cinematic visual effects suitable for final production, while the Seedance 2.0 Fast variant offers roughly 3 times faster generation and approximately 91% cost savings at $0.022 per second of output, ideal for rapid iteration and volume workflows.
3. Model Architecture & Technical Details
Seedance 2.0 is built around the Dual-Branch Diffusion Transformer (DB-DiT), which separately processes video and audio streams via transformer-based denoising diffusion models while synchronizing generation steps to enforce audio-visual alignment. The system leverages a World Model that integrates physics simulation modules, enabling consistent spatial and temporal object behaviors within video sequences.
Training was conducted in multiple stages on large-scale, diverse datasets spanning images, videos, text captions, and audio recordings across multiple languages. Initial large-scale pre-training utilized resolutions spanning from 720p to 1080p, followed by supervised fine-tuning (SFT) to improve text and visual prompt conditioning fidelity. Reinforcement Learning with Human Feedback (RLHF) optimized multi-dimensional reward models that simultaneously assess aesthetics, motion coherence, and audio-visual synchronization quality.
The training pipeline supports multiple aspect ratios including 9:16, 16:9, 1:1, and 4:3, and target output lengths from 4 to 60 seconds. Specialized modules enable the @ reference system for fine-grained control of creative elements based on provided input assets.
4. Performance Highlights
Seedance 2.0 was benchmarked on the comprehensive SeedVideoBench-2.0 suite, which evaluates generative video models across over 50 image-based and 24 video-based benchmarks covering diverse content domains and multi-modal tasks.
| Rank | Model | Developer | Score/Metric | Release Date |
|---|---|---|---|---|
| 1 | Kling 3.0 | External | Competitive | 2025 |
| 2 | Sora 2 | External | Competitive | 2025 |
| 3 | Seedance 2.0 | ByteDance | High audiovisual sync, motion realism | 2026 |
| 4 | Veo 3.1 | External | Strong baseline | 2025 |
Seedance 2.0 matches or exceeds these contemporary models in synchronized video-audio generation, demonstrating especially strong performance in phoneme-level lip synchronization and motion naturalism thanks to the World Model component. Its 30% speed improvement and 90% output usability rate reflect notable efficiency advancements.
5. Intended Use & Applications
-
Social Media Content Creation: Efficiently generate engaging short videos with synchronized audio and visually rich effects, tailored for platforms like TikTok and Instagram.
-
E-commerce Product Videos: Automatically produce dynamic product showcases combining text, image, and video inputs with realistic motion and sound to enhance online shopping experiences.
-
Marketing Campaigns: Craft high-quality cinematic promotional content that integrates brand assets via the @ reference system for tailored storytelling and audience engagement.
-
Music Videos: Generate synchronized visuals with phoneme-accurate lip-syncing for multilingual vocal tracks to support artist and record label promotional needs.
-
Short Narrative Films: Create compelling narrative-driven video clips with coherent motion and spatial consistency, supporting indie filmmakers and content creators.
-
Fashion and Luxury Showcases: Produce visually detailed and aesthetic presentations incorporating texture and lighting refinements for high-end brand communications.






