Kling V3.0 API: AI Director Video with Native Audio

Kling 3.0 API 通过一个 OpenAI 兼容的密钥，将 Kuaishou 的旗舰视频套件引入 Atlas Cloud。它包含两个模型：Kling 3.0 用于 AI Director 叙事、多语言唇形同步和精准的屏幕文本；Kling 3.0 Omni (O3) 用于通过短视频或图像进行主体和声音克隆。两者都能在同一次处理中生成原生音频，最高支持 4K 输出。在可靠的基础设施上构建电影级叙事、全球营销、多语言广告和系列化角色内容。

探索领先模型

Atlas Cloud 为您提供最新的行业领先创意模型。

NEW

文生视频

TURBO

Kling V3.0 Turbo Text-to-Video

Kling V3.0 Turbo Text-to-Video generates dynamic cinematic videos from text prompts using MVL technology. Supports first/last frame control and audio generation.

Kling V3.0 Turbo Image-to-Video

Kling V3.0 Turbo Image-to-Video transforms static images into dynamic cinematic videos using MVL technology. Supports first/last frame control and audio generation.

Kling Video O3 4K Text-to-Video

Kling Omni Video O3 (4K) is Kuaishou advanced unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Generates high-quality videos from text prompts with natural motion and audio generation support.

Kling Video O3 4K Image-to-Video

Kling Omni Video O3 (4K) Image-to-Video transforms static images into dynamic cinematic videos using MVL technology. Supports first/last frame control and audio generation.

Kling v3.0 4K Image-to-Video

Kling v3.0 4K Image-to-Video model by Kuaishou. High-quality video generation from images.

Kling v3.0 Std Image-to-Video

Kling v3.0 Standard Image-to-Video model by Kuaishou. High-quality video generation from images.

Kling v3.0 Pro Image-to-Video

Kling v3.0 Professional Image-to-Video model by Kuaishou. Premium quality video generation from images with advanced features.

Kling v3.0 Pro Text-to-Video

Kling v3.0 Professional Text-to-Video model by Kuaishou. Premium quality video generation from text prompts with advanced features.

Kling v3.0 4K Text-to-Video

Kling v3.0 4K Text-to-Video model by Kuaishou. High-quality video generation from text prompts.

Kling v3.0 Std Text-to-Video

Kling v3.0 Standard Text-to-Video model by Kuaishou. High-quality video generation from text prompts.

Kling Video O3 Pro Text-to-Video

Kling Omni Video O3 is Kuaishou's advanced unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Professional quality with enhanced motion and detail.

Kling Video O3 Pro Image-to-Video

Kling Omni Video O3 Image-to-Video transforms static images into dynamic cinematic videos using MVL technology. Professional quality with first/last frame control and audio generation.

Kling Video O3 Pro Reference-to-Video

Kling Omni Video O3 Reference-to-Video generates creative videos using character, prop, or scene references. Professional quality with up to 7 reference images and optional video input.

Kling Video O3 Pro Video-Edit

Kling Omni Video O3 Video-Edit enables conversational video editing through natural language commands. Professional quality with object removal/replacement, background changes, and effects.

Kling Video O3 Std Video-Edit

Kling Omni Video O3 Video-Edit (Standard) enables natural-language video edits: remove or replace objects, change backgrounds, add effects, and more. Video duration limited to 10s.

Kling Video O3 Std Reference-to-Video

Kling Omni Video O3 (Standard) Reference-to-Video generates creative videos using character, prop, or scene references. Supports up to 7 reference images and optional video input.

Kling Video O3 Std Image-to-Video

Kling Omni Video O3 (Standard) Image-to-Video transforms static images into dynamic cinematic videos using MVL technology. Supports first/last frame control and audio generation.

Kling Video O3 Std Text-to-Video

Kling Omni Video O3 (Standard) is Kuaishou's advanced unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Generates high-quality videos from text prompts with natural motion and audio generation support.

From$0.084/秒

$0.071/秒

-15%

峰值速度

最低成本

模态	描述
Kling 3.0 Std T2V API(Text To Video)	Kling 3.0 Std T2V API 赋能开发者将文本提示词转化为电影级视频片段。通过定义运镜、场景和动作，它能生成流畅、音画同步的内容，专为专业故事板绘制、动态营销和社交媒体叙事而优化。
Kling 3.0 Std I2V API(Image To Video)	Kling 3.0 Std I2V API 将静态图像和文本提示词转换为视频片段。通过支持参考帧和尾帧控制，它引导运动轨迹并生成音画同步内容，以实现视觉连贯性和标准营销素材。
Kling 3.0 Pro T2V API(Text To Video)	Kling 3.0 Pro T2V API 能够根据文本提示生成具有先进物理特性和电影级纹理的高保真视频。它支持多镜头叙事，相比 Standard 版本提供更高的细节和视觉复杂性。
Kling 3.0 Pro I2V API(Image To Video)	Kling 3.0 Pro I2V API 将图像转换为具有增强细节保留功能的高分辨率视频。它为高端商业制作提供专业级的摄像机控制和精确的视听同步。
Kling Video O3 Std T2V API(Text To Video)	Kling Video O3 Std T2V API 可根据文本生成视频。它支持原生音频生成。
Kling Video O3 Std I2V API(Image To Video)	Kling Video O3 Std I2V API 使用图像和文本生成具有高参考还原度的视频。它专为在标准分辨率工作流中需要稳定角色或产品呈现的任务而设计。
Kling Video O3 Std R2V(Video To Video)	Kling Video O3 Std R2V API 使用角色、道具或场景参考生成创意视频。支持最多7张参考图像和可选的视频输入。它具备视频风格重塑和属性编辑功能，适用于标准质量的社交媒体和实验性内容。
Kling Video O3 Std Video Edit API(Video To Video)	Kling Video O3 Std Video Edit API(Video To Video) 支持自然语言视频编辑：移除或替换物体、更换背景、添加特效等。
Kling Video O3 Pro T2V API(Text To Video)	Kling Video O3 Pro T2V API 提供文生视频生成功能。它在复杂的场景中提供专业级的人物一致性和电影级的光影效果，实现电影品质的叙事。
Kling Video O3 Pro I2V API(Image To Video)	Kling Video O3 Pro I2V API 利用参考优先架构将图像转换为专业品质的视频。它确保了视觉细节的高保真留存和流畅的动作，适用于高端数字营销和视觉特效。
Kling Video O3 Pro R2V(Video To Video)	Kling Video O3 Pro R2V 提供视频变换和风格重塑功能。它具备像素级控制和运动稳定性，适用于专业视频剪辑和高端视觉修改。
Kling Video O3 Pro Video Edit(Video To Video)	Kling Video O3 Pro Video Edit (Video To Video) 通过自然语言提示词实现高质量的视频修改。它提供先进的物体移除、背景替换和特效合成功能，具备专业级的精度并能完美保留细节。

Kling 3.0 API 功能与展示

Kling 3.0 API 为 Atlas Cloud 带来了 Kuaishou 的电影级工具包：一个用于多镜头叙事、多语言口型同步与屏幕文本、主体与声音克隆、原生音频、参考控制以及最高 4K 输出的 AI Director。

智能电影级叙事 (Kling 3.0)

Kling 3.0 引入了“AI 导演”功能，能从提示词中直观把握叙事脉络，自动编排镜头构图和运镜角度，从而实现正反打对话序列等高级电影技法。它仅需一次生成即可呈现成熟的视觉叙事，让每位创作者都能轻松驾驭复杂的电影表达。

单步生成原生音频

Kling 3.0 在生成视频的同一次处理中生成语音、音效和背景音频，因此输出的成品片段已预先将声音与动作完美匹配。无需独立的音频模型或后期制作步骤，从而确保对话、特效和环境音与屏幕画面保持精准同步。

原生4K输出

Kling 3.0 renders at resolutions up to native 4K, holding fine texture, lighting, and depth that survive on large screens and tight crops. The same prompt scales from quick standard-resolution drafts to a high-resolution master, so previews and final renders come from one model.

多语言音画同步与高保真文本 (Kling 3.0)

Kling 3.0 实现了文本与视觉字符的精准映射，支持中英日韩西等混合语言对话及方言，唇形同步自然流畅。它直接满足了电商和全球营销对高保真文本展示及本地化内容制作的需求。

专业级主体一致性 (Kling O3)

Kling O3 支持从上传或拍摄的 3–8 秒视频中提取人物特征，完美还原人物的相貌、身形和神态。它开启了“主演自己电影”的创作快感，非常适合对人物一致性要求极高的短剧和连载内容。

Reference-to-Video and Multi-Element Control

Kling O3 takes up to 7 reference images plus an optional video to lock characters, props, and scenes across a generation. It reproduces each referenced element faithfully, so a specific face, object, and setting stay consistent shot to shot, the foundation for branded series and template-style content.

One Prompt, Many Models: Kling 3.0 API

Run the same prompt through the Kling 3.0 API and other leading video models on Atlas Cloud, and compare how each handles cinematic motion, character consistency, and audio in a single scene.

提示词

电影感多镜头动作序列,时长 10 秒。Shot 1,low tracking:一名孤身骑手策马奔过狂风吹拂的沙漠山脊,正值黄金时刻,马蹄后扬起尘土。Shot 2,hard cut 切到 side tracking:马跃过一道深谷,鬃毛与骑手的披风在半空中随风猎猎作响。Shot 3,whip pan 切到高空航拍:骑手在高耸的岩柱间穿行,身后一场沙暴正滚滚袭来。Shot 4,fast push-in:特写骑手在破旧兜帽下坚定的双眼,沙砾从镜头前掠过。Shot 5,dramatic wide:人马在俯瞰广阔峡谷的悬崖边急停,披风翻飞,阳光炸开光晕。动态运镜,体积光,飞扬的尘沙,照片级真实。

Kling V3.0

Seedance 2.0

Kling V2.6 Pro

提示词

Kling V3.0

Seedance 2.0

Kling V2.6 Pro

What You Can Build with the Kling 3.0 API

From cinematic storytelling and multilingual marketing to character cloning and precise video editing, the Kling 3.0 API turns text, images, and reference clips into production-ready video with native audio.

使用 Kling 3.0 API 进行动态物理仿真

Kling 3.0 利用先进的物理建模技术生成复杂物体之间逼真的交互，包括流体力学、布料动态和结构碰撞。通过模拟现实世界的重力和材质属性，该 API 可生成适用于专业视觉特效、逼真产品广告和需要精确物理精度的技术演示的高保真运动。

Cinematic Storytelling with an AI Director

Kling 3.0 reads a prompt like a shot list and plans the sequence for you, setting shot composition, camera angles, and transitions, including shot-reverse-shot dialogue. It delivers a multi-shot visual narrative in a single generation instead of one isolated clip, a fast path to previs, trailers, and social hooks without booking a crew.

使用 Kling 3.0 API 进行精准视频编辑与转换

Kling 3.0 API 通过自然语言指令实现复杂的视频对视频（video-to-video）修改，支持无缝背景替换、物体移除和风格迁移。该 API 在保留原始运动结构的同时更改特定视觉属性，从而为寻求高效、高分辨率内容迭代的创意机构和社交媒体平台简化了后期制作工作流程。

Subject and Voice Cloning for Serialized Content

Kling O3 extracts a character's appearance and voice from a short 3 to 8 second video or an image, then reproduces that subject across new clips with matching lip-sync. It keeps a face, build, and voice consistent from episode to episode, which suits short dramas, digital hosts, and serialized social content where the same character has to return on demand.

使用 Kling 3.0 API 实现一致的角色叙事

利用参考驱动技术，Kling 3.0 在生成的多个片段中保持了严格的角色和风格一致性。这一能力使开发者能够构建具有稳定面部特征和环境光照的连贯多镜头序列。它是需要视觉统一性的数字人创作、连载叙事和品牌一致性营销活动的理想解决方案。

Multilingual Dialogue and On-Screen Text

Kling 3.0 renders crisp, readable on-screen text and speaks in multiple languages, with natural lip-sync across Chinese, English, Japanese, Korean, and Spanish, plus mixed-language delivery in one clip. You can assign dialogue to each character so scenes with several speakers stay clear, which fits e-commerce, localized campaigns, and global marketing that depend on accurate text and voice.

How the Kling 3.0 API Compares

See how the Kling 3.0 API lines up against other leading video models on inputs, duration, resolution, and native audio, so you can match each project to the model that fits.

模型	输入类型	输出时长	分辨率	音频生成
Kling 3.0	文本、图像、视频	5s;10s	720P	√
Kling O1	文本，图像	5s;10s	720P	×
Kling 2.6	文本、图像、视频	5s;10s	720P	√
Seedance 2.0	文本、图像、视频、音频	4~15s	2K, 1080P, 720P, 480P	√
Veo 3.1	文本、图像	4s, 6s, 8s	1080P, 720P	√
Wan 2.6	文本、图像、视频、音频	5s, 10s, 15s	1080P, 720P	√
Hailuo 2.3	文本、图像	5s	1080P	×

如何在 Atlas Cloud 上使用 Kling V3.0

几分钟即可上手 — 按照以下简单步骤，通过 Atlas Cloud 平台集成和部署模型。

创建 Atlas Cloud 账户

在 atlascloud.ai 注册并完成验证。新用户可获得免费额度，用于探索平台和测试模型。

为何在 Atlas Cloud 使用 Kling V3.0

将先进的 Kling V3.0 模型与 Atlas Cloud 的 GPU 加速平台相结合，提供无与伦比的性能、可扩展性和开发体验。

性能与灵活性

低延迟：
GPU 优化推理，实现实时响应。

统一 API：
一次集成，畅用 Kling V3.0、GPT、Gemini 和 DeepSeek。

透明定价：
按 Token 计费，支持 Serverless 模式。

企业与规模

开发者体验：
SDK、数据分析、微调工具和模板一应俱全。

可靠性：
99.99% 可用性、RBAC 权限控制、合规日志。

安全与合规：
SOC 2 Type II 认证、HIPAA 合规、美国数据主权。

Kling 3.0 API: Frequently Asked Questions

通过整合视频主体参考、图像主体参考以及声音/语调参考。

标准版平衡了生成速度与质量，适用于社交媒体内容和快速原型设计。专业版专为专业影视需求设计，提供更逼真的物理动态模拟和更精细的材质纹理输出。

R2V 专注于“全局重塑”，例如将真人视频转换为特定的动画或写实艺术风格。相比之下，Video Edit 专注于“基于指令的修改”，支持精确的后期制作操作，如添加、删除或修改视频中的特定元素。

Kling 3.0 produces clips in the 5 to 10 second range, with resolution options up to 4K on the dedicated 4K models. Standard and Pro tiers cover everyday and high-fidelity work, while the 4K variants are there when you need maximum detail. Set the resolution and duration per request to balance quality, speed, and cost.

Standard balances speed and quality for social content and rapid prototyping. Pro targets professional film and video work, with more realistic physics and finer material detail. Turbo is the accelerated option for faster turnaround. All tiers share the same endpoints, so you can move a job between them without changing your integration.

Kling 3.0 renders crisp, readable text directly in the frame and generates natural lip-sync across several languages, including Chinese, English, Japanese, Korean, and Spanish, with mixed-language delivery in one clip. You can assign dialogue to specific characters so scenes with multiple speakers stay clear, which suits e-commerce, localization, and global marketing.

Kling O3 extracts a subject's appearance and voice from a short 3 to 8 second video or an image, then reproduces that character across new clips with matching lip-sync. Combined with reference images for props and scenes, this keeps a face, build, and voice stable from shot to shot, which is what serialized stories and digital hosts need.

Yes. The Kling O3 video editing endpoint applies natural-language instructions to footage, including object removal and replacement, background changes, and added effects. Reference-to-video also handles broader restyling, such as converting live footage into a different visual style, so you can revise content without regenerating it from scratch.

Generation is asynchronous: each request returns a task ID that you poll until the clip is ready, which fits queues and high-volume pipelines. Rate limits and concurrency vary by account tier, so add exponential backoff and a retry on a 429 response, and contact support to raise limits as you scale. The Enterprise plan offers higher ceilings and custom limits.

Uploads that contain real human faces are subject to platform content rules and identity protections, and may be restricted. For consistent characters, use Kling O3's subject reference workflow with original or licensed material rather than a real person's photo, and review Atlas Cloud's acceptable use terms before building face-based workflows.

探索更多系列

Seedance 2.0

Seedance 2.0 API 为您提供 ByteDance 多模态视频模型的生产级访问权限——支持四模态输入（文本、图像、视频、音频），以及行业领先的“Universal Reference”（通用参考）系统，可在不同镜头间锁定构图、运镜和角色动作。只需一次 API 调用即可集成导演级控制，固定费率为 $0.09/秒，即时获取密钥，无需排队——由企业级正常运行时间和合规性提供保障。Seedance 2.0 原生 4K 现已上线！

查看系列

Grok Imagine

Grok Imagine API 为开发者提供 xAI 的图像、视频和音频生成一站式套件。它可以生成分辨率高达 2K 且支持多语言文本渲染的图像，以及长达 15 秒且带有原生同步音频和基于参考图像编辑功能的视频。在 Atlas Cloud 上，只需一个密钥即可运行每个 Grok Imagine 模式，因此您可以在图像、视频和音频之间无缝切换，无需单独设置，每张图像 0.02 美元起，每秒 0.05 美元起。

查看系列

Gemini Omni Flash

Gemini Omni API 将 Google DeepMind 在 Google I/O 2026 上发布的多模态视频生成与编辑模型带入你的技术栈。Gemini Omni 将 Gemini 的推理引擎与生成式媒体融合，可接受文本、图像、视频和音频的任意组合输入，生成一致且以知识为依据的输出。通过自然对话不断打磨结果：替换物体、重写场景、切换风格，同时保持物理规律、角色形象和画面连贯性不变。Atlas Cloud 通过统一的 API 提供完整的 Gemini Omni Flash 系列——文生视频、支持最多 7 张参考图的图生视频，以及参考图生视频——按秒计费、价格透明，低至 $0.112 起，且无需订阅。立即开始构建。

查看系列

GPT Image 2

GPT Image 2 API 为开发者提供了访问 OpenAI 最新图像模型的途径，它是 GPT Image 1.5 的继任者。该模型可生成和编辑图像，能够在拉丁和 CJK 文字上实现准确的文本渲染，并在海报、样机和信息图表方面具备强大的排版能力。在 Atlas Cloud 上，您可以通过一个统一的 API 与 300 多个模型一起访问它，并享受免费额度、99.99% 的正常运行时间，且无需 OpenAI 组织验证。

查看系列

Google

Google最强大的创意模型现已在Atlas Cloud上全面可用。Veo 3.1提供电影级别的视频生成，Nano Banana 2支持高保真图像创建，而Gemini为每个工作流带来多模态智能。通过单一API key即可访问完整的Google模型套件，提供Day-0可用性和按需付费（pay-as-you-go）定价。

查看系列

Seedance 2.0 Mini

Seedance 2.0 Mini 将 ByteDance 的多模态视频生成技术引入到对速度和成本要求极高的工作流中。它以更轻量的占用空间提供 Seedance 2.0 的核心能力——更快的生成速度、更低的单条视频成本，并且使用您现有的同款 API 集成。对于运行高吞吐量流水线或进行大规模原型设计的团队来说，Mini 是最实用的默认选择。

查看系列

ByteDance

从电影级视频生成到高保真图像创建，ByteDance 最强大的模型现已在 Atlas Cloud 上线。以最低的推理定价和零基础设施开销，大规模运行 Seedance 和 Seedream。

查看系列

Alibaba

Atlas Cloud 将 Alibaba 的全系模型阵容整合至同一个 API 中：Qwen 用于语言和图像任务，Wan 用于高达 1080p 的视频生成。所有模型均采用按需付费模式，无需订阅。您可以使用现有的 OpenAI 兼容客户端，通过单一的 base URL 访问 Alibaba API。

查看系列

OpenAI

Atlas Cloud 为您提供访问完整 OpenAI API 产品线的权限，从用于图像生成的 GPT Image 2 到用于视频的 Sora 2。每个模型均采用按需付费模式，无月度消费限制。使用兼容 OpenAI 的 API，只需简单替换基础 URL 即可轻松接入。

查看系列

xAI

在 Atlas Cloud 上使用 xAI API 构建完整的图像和视频处理工作流。以 2K 分辨率生成、使用参考图像进行编辑，并将图像动画化为音画同步的视频片段。

查看系列

Kwaivgi

Kwaivgi API 价格低于标准定价 15%。Atlas Cloud 提供对最新 Kling 版本的零日（Day-0）访问权限，采用按需付费定价且无席位限制。一个账户，一个密钥，畅享从标准版到大师版的所有 Kling 模型。

查看系列

Seedream 5.0 Pro

Seedream 5.0 Pro API 为开发者在 Atlas Cloud 上提供了字节跳动的可控图像编辑模型。它通过锚点和坐标精确定位编辑，将图像分离为可编辑图层，融合多个参考，并精准匹配颜色和材质，支持 2K 和 3K 分辨率的多语言文本。在 Atlas Cloud 上，您只需一个密钥即可访问！

查看系列

一个 API，畅享全模态 AI。

探索全部模型

Kling V3.0 API: AI Director Video with Native Audio

探索领先模型

Kling V3.0 Turbo Text-to-Video

Kling V3.0 Turbo Image-to-Video

Kling Video O3 4K Text-to-Video

Kling Video O3 4K Image-to-Video

Kling v3.0 4K Image-to-Video

Kling v3.0 Std Image-to-Video

Kling v3.0 Pro Image-to-Video

Kling v3.0 Pro Text-to-Video

Kling v3.0 4K Text-to-Video

Kling v3.0 Std Text-to-Video

Kling Video O3 Pro Text-to-Video

Kling Video O3 Pro Image-to-Video

Kling Video O3 Pro Reference-to-Video

Kling Video O3 Pro Video-Edit

Kling Video O3 Std Video-Edit

Kling Video O3 Std Reference-to-Video

Kling Video O3 Std Image-to-Video

Kling Video O3 Std Text-to-Video

峰值速度

Kling 3.0 API 功能与展示

智能电影级叙事 (Kling 3.0)

单步生成原生音频

原生4K输出

多语言音画同步与高保真文本 (Kling 3.0)

专业级主体一致性 (Kling O3)

Reference-to-Video and Multi-Element Control

One Prompt, Many Models: Kling 3.0 API

What You Can Build with the Kling 3.0 API

使用 Kling 3.0 API 进行动态物理仿真

Cinematic Storytelling with an AI Director

使用 Kling 3.0 API 进行精准视频编辑与转换

Subject and Voice Cloning for Serialized Content

使用 Kling 3.0 API 实现一致的角色叙事

Multilingual Dialogue and On-Screen Text

How the Kling 3.0 API Compares

如何在 Atlas Cloud 上使用 Kling V3.0

创建 Atlas Cloud 账户

为何在 Atlas Cloud 使用 Kling V3.0

性能与灵活性

企业与规模

Kling 3.0 API: Frequently Asked Questions

探索更多系列

Seedance 2.0

Grok Imagine

Gemini Omni Flash

GPT Image 2

Google

Seedance 2.0 Mini

ByteDance

Alibaba

OpenAI

xAI

Kwaivgi

Seedream 5.0 Pro

一个 API，畅享全模态 AI。

Join our Discord community