
Seedance 2.0 Image-to-Video API by ByteDance
Generate videos from a first-frame image (and optional last-frame) with native audio.
Eingabe
Ausgabe
InaktivPro Sekunde generiertes 720p-Video werden Ihnen $0.2419/Sekunde berechnet. Die Anfrage kostet $0.0112 pro 1000 Tokens. Die Token-Anzahl ergibt sich aus (Höhe des Ausgabevideos × Breite des Ausgabevideos ×(Eingabedauer + Ausgabedauer)× 24) / 1024. Bei Video-Eingaben sinkt der Preis auf $0.00688 pro 1000 Tokens. Mit Video-Eingaben und 720p-Auflösung beträgt der Preis $0.1486 pro Sekunde.
Sie können fortfahren mit:
Codebeispiel
import requests
import time
# Step 1: Start video generation
generate_url = "https://api.atlascloud.ai/api/v1/model/generateVideo"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer $ATLASCLOUD_API_KEY"
}
data = {
"model": "bytedance/seedance-2.0/image-to-video",
"prompt": "A beautiful sunset over the ocean with gentle waves",
"width": 512,
"height": 512,
"duration": 3,
"fps": 24,
}
generate_response = requests.post(generate_url, headers=headers, json=data)
generate_result = generate_response.json()
prediction_id = generate_result["data"]["id"]
# Step 2: Poll for result
poll_url = f"https://api.atlascloud.ai/api/v1/model/prediction/{prediction_id}"
def check_status():
while True:
response = requests.get(poll_url, headers={"Authorization": "Bearer $ATLASCLOUD_API_KEY"})
result = response.json()
if result["data"]["status"] in ["completed", "succeeded"]:
print("Generated video:", result["data"]["outputs"][0])
return result["data"]["outputs"][0]
elif result["data"]["status"] == "failed":
raise Exception(result["data"]["error"] or "Generation failed")
else:
# Still processing, wait 2 seconds
time.sleep(2)
video_url = check_status()Installieren
Installieren Sie das erforderliche Paket für Ihre Programmiersprache.
pip install requestsAuthentifizierung
Alle API-Anfragen erfordern eine Authentifizierung über einen API-Schlüssel. Sie können Ihren API-Schlüssel über das Atlas Cloud Dashboard erhalten.
export ATLASCLOUD_API_KEY="your-api-key-here"HTTP-Header
import os
API_KEY = os.environ.get("ATLASCLOUD_API_KEY")
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}"
}Geben Sie Ihren API-Schlüssel niemals in clientseitigem Code oder öffentlichen Repositories preis. Verwenden Sie stattdessen Umgebungsvariablen oder einen Backend-Proxy.
Anfrage senden
import requests
url = "https://api.atlascloud.ai/api/v1/model/generateVideo"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer $ATLASCLOUD_API_KEY"
}
data = {
"model": "your-model",
"prompt": "A beautiful landscape"
}
response = requests.post(url, headers=headers, json=data)
print(response.json())Anfrage senden
Senden Sie eine asynchrone Generierungsanfrage. Die API gibt eine Vorhersage-ID zurück, mit der Sie den Status prüfen und das Ergebnis abrufen können.
/api/v1/model/generateVideoAnfragekörper
import requests
url = "https://api.atlascloud.ai/api/v1/model/generateVideo"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer $ATLASCLOUD_API_KEY"
}
data = {
"model": "bytedance/seedance-2.0/image-to-video",
"input": {
"prompt": "A beautiful sunset over the ocean with gentle waves"
}
}
response = requests.post(url, headers=headers, json=data)
result = response.json()
print(f"Prediction ID: {result['id']}")
print(f"Status: {result['status']}")Antwort
{
"id": "pred_abc123",
"status": "processing",
"model": "model-name",
"created_at": "2025-01-01T00:00:00Z"
}Status prüfen
Fragen Sie den Vorhersage-Endpunkt ab, um den aktuellen Status Ihrer Anfrage zu überprüfen.
/api/v1/model/prediction/{prediction_id}Abfrage-Beispiel
import requests
import time
prediction_id = "pred_abc123"
url = f"https://api.atlascloud.ai/api/v1/model/prediction/{prediction_id}"
headers = { "Authorization": "Bearer $ATLASCLOUD_API_KEY" }
while True:
response = requests.get(url, headers=headers)
result = response.json()
status = result["data"]["status"]
print(f"Status: {status}")
if status in ["completed", "succeeded"]:
output_url = result["data"]["outputs"][0]
print(f"Output URL: {output_url}")
break
elif status == "failed":
print(f"Error: {result['data'].get('error', 'Unknown')}")
break
time.sleep(3)Statuswerte
processingDie Anfrage wird noch verarbeitet.completedDie Generierung ist abgeschlossen. Ergebnisse sind verfügbar.succeededDie Generierung war erfolgreich. Ergebnisse sind verfügbar.failedDie Generierung ist fehlgeschlagen. Überprüfen Sie das Fehlerfeld.Abgeschlossene Antwort
{
"data": {
"id": "pred_abc123",
"status": "completed",
"outputs": [
"https://storage.atlascloud.ai/outputs/result.mp4"
],
"metrics": {
"predict_time": 45.2
},
"created_at": "2025-01-01T00:00:00Z",
"completed_at": "2025-01-01T00:00:10Z"
}
}Dateien hochladen
Laden Sie Dateien in den Atlas Cloud Speicher hoch und erhalten Sie eine URL, die Sie in Ihren API-Anfragen verwenden können. Verwenden Sie multipart/form-data zum Hochladen.
/api/v1/model/uploadMediaUpload-Beispiel
import requests
url = "https://api.atlascloud.ai/api/v1/model/uploadMedia"
headers = { "Authorization": "Bearer $ATLASCLOUD_API_KEY" }
with open("image.png", "rb") as f:
files = {"file": ("image.png", f, "image/png")}
response = requests.post(url, headers=headers, files=files)
result = response.json()
download_url = result["data"]["download_url"]
print(f"File URL: {download_url}")Antwort
{
"data": {
"download_url": "https://storage.atlascloud.ai/uploads/abc123/image.png",
"file_name": "image.png",
"content_type": "image/png",
"size": 1024000
}
}Eingabe-Schema
Die folgenden Parameter werden im Anfragekörper akzeptiert.
Keine Parameter verfügbar.
Beispiel-Anfragekörper
{
"model": "bytedance/seedance-2.0/image-to-video"
}Ausgabe-Schema
Die API gibt eine Vorhersage-Antwort mit den generierten Ausgabe-URLs zurück.
Beispielantwort
{
"id": "pred_abc123",
"status": "completed",
"model": "model-name",
"outputs": [
"https://storage.atlascloud.ai/outputs/result.mp4"
],
"metrics": {
"predict_time": 45.2
},
"created_at": "2025-01-01T00:00:00Z",
"completed_at": "2025-01-01T00:00:10Z"
}Atlas Cloud Skills
Atlas Cloud Skills integriert über 300 KI-Modelle direkt in Ihren KI-Coding-Assistenten. Ein Befehl zur Installation, dann verwenden Sie natürliche Sprache, um Bilder, Videos zu generieren und mit LLMs zu chatten.
Unterstützte Clients
Installieren
npx skills add AtlasCloudAI/atlas-cloud-skillsAPI-Schlüssel einrichten
Erhalten Sie Ihren API-Schlüssel über das Atlas Cloud Dashboard und setzen Sie ihn als Umgebungsvariable.
export ATLASCLOUD_API_KEY="your-api-key-here"Funktionen
Nach der Installation können Sie natürliche Sprache in Ihrem KI-Assistenten verwenden, um auf alle Atlas Cloud Modelle zuzugreifen.
MCP-Server
Der Atlas Cloud MCP-Server verbindet Ihre IDE mit über 300 KI-Modellen über das Model Context Protocol. Funktioniert mit jedem MCP-kompatiblen Client.
Unterstützte Clients
Installieren
npx -y atlascloud-mcpKonfiguration
Fügen Sie die folgende Konfiguration zur MCP-Einstellungsdatei Ihrer IDE hinzu.
{
"mcpServers": {
"atlascloud": {
"command": "npx",
"args": [
"-y",
"atlascloud-mcp"
],
"env": {
"ATLASCLOUD_API_KEY": "your-api-key-here"
}
}
}
}Verfügbare Werkzeuge
API-Schema
Schema nicht verfügbarKeine Beispiele verfügbar
Anmelden, um Anfrageverlauf anzuzeigen
Sie müssen angemeldet sein, um auf Ihren Modellanfrageverlauf zuzugreifen.
Anmelden1. Introduction
Seedance 2.0 is a state-of-the-art multimodal generative AI model designed for synchronized video and audio content creation. Developed by ByteDance and integrated into the CapCut/Dreamina platform as of March 2026, this model family advances the field of generative multimedia by combining sophisticated diffusion transformer architectures with physics-informed world modeling for realistic motion and spatial consistency.
Seedance 2.0’s significance lies in its Dual-Branch Diffusion Transformer (DB-DiT) architecture that jointly processes video and audio streams, enabling phoneme-level lip synchronization across multiple languages. Compared to previous iterations, it achieves substantially higher output usability rates and faster generation speeds. The two variants target different workloads: Seedance 2.0 delivers high-fidelity, cinematic-quality renders with enhanced lighting and texture detail, while Seedance 2.0 Fast provides a cost-effective, accelerated pipeline optimized for high throughput and rapid prototyping.
2. Key Features & Innovations
-
Dual-Branch Diffusion Transformer Architecture: Seedance 2.0 integrates separate yet synchronized diffusion branches for video and audio, enabling tight coupling between visual motion and sound generation. This architecture improves motion realism and audio-visual coherence beyond previous generative models.
-
World Model with Physics Simulation: The model incorporates a physics-based world modeling approach that simulates realistic object motion and spatial consistency over time. This leads to naturalistic dynamics and stable scene composition across generated video sequences.
-
Rich Multimodal Input Support: Seedance 2.0 accepts diverse input formats including text prompts, up to 9 images, and up to 3 video or audio clips of 15 seconds each. This flexibility allows nuanced content creation workflows combining static, dynamic, and auditory cues.
-
Phoneme-Level Lip Synchronization: The native audio generation pipeline supports lip-sync at the phoneme granularity in 8+ languages, ensuring high fidelity mouth movements closely match generated speech or singing.
-
High Usability and Efficiency: The model achieves an estimated 90% usable output rate compared to an industry average of approximately 20%, reducing post-processing overhead. Additionally, it delivers a 30% inference speed advantage over predecessor systems.
-
API Variants for Different Use Cases: The Seedance 2.0 endpoint is geared toward high fidelity and cinematic visual effects suitable for final production, while the Seedance 2.0 Fast variant offers roughly 3 times faster generation and approximately 91% cost savings at $0.022 per second of output, ideal for rapid iteration and volume workflows.
3. Model Architecture & Technical Details
Seedance 2.0 is built around the Dual-Branch Diffusion Transformer (DB-DiT), which separately processes video and audio streams via transformer-based denoising diffusion models while synchronizing generation steps to enforce audio-visual alignment. The system leverages a World Model that integrates physics simulation modules, enabling consistent spatial and temporal object behaviors within video sequences.
Training was conducted in multiple stages on large-scale, diverse datasets spanning images, videos, text captions, and audio recordings across multiple languages. Initial large-scale pre-training utilized resolutions spanning from 720p to 1080p, followed by supervised fine-tuning (SFT) to improve text and visual prompt conditioning fidelity. Reinforcement Learning with Human Feedback (RLHF) optimized multi-dimensional reward models that simultaneously assess aesthetics, motion coherence, and audio-visual synchronization quality.
The training pipeline supports multiple aspect ratios including 9:16, 16:9, 1:1, and 4:3, and target output lengths from 4 to 60 seconds. Specialized modules enable the @ reference system for fine-grained control of creative elements based on provided input assets.
4. Performance Highlights
Seedance 2.0 was benchmarked on the comprehensive SeedVideoBench-2.0 suite, which evaluates generative video models across over 50 image-based and 24 video-based benchmarks covering diverse content domains and multi-modal tasks.
| Rank | Model | Developer | Score/Metric | Release Date |
|---|---|---|---|---|
| 1 | Kling 3.0 | External | Competitive | 2025 |
| 2 | Sora 2 | External | Competitive | 2025 |
| 3 | Seedance 2.0 | ByteDance | High audiovisual sync, motion realism | 2026 |
| 4 | Veo 3.1 | External | Strong baseline | 2025 |
Seedance 2.0 matches or exceeds these contemporary models in synchronized video-audio generation, demonstrating especially strong performance in phoneme-level lip synchronization and motion naturalism thanks to the World Model component. Its 30% speed improvement and 90% output usability rate reflect notable efficiency advancements.
5. Intended Use & Applications
-
Social Media Content Creation: Efficiently generate engaging short videos with synchronized audio and visually rich effects, tailored for platforms like TikTok and Instagram.
-
E-commerce Product Videos: Automatically produce dynamic product showcases combining text, image, and video inputs with realistic motion and sound to enhance online shopping experiences.
-
Marketing Campaigns: Craft high-quality cinematic promotional content that integrates brand assets via the @ reference system for tailored storytelling and audience engagement.
-
Music Videos: Generate synchronized visuals with phoneme-accurate lip-syncing for multilingual vocal tracks to support artist and record label promotional needs.
-
Short Narrative Films: Create compelling narrative-driven video clips with coherent motion and spatial consistency, supporting indie filmmakers and content creators.
-
Fashion and Luxury Showcases: Produce visually detailed and aesthetic presentations incorporating texture and lighting refinements for high-end brand communications.






