
Seed Audio 1.0 API by ByteDance
Doubao‑Audio‑Generate‑1.0 is Doubao Voice’s next‑generation audio‑generation engine. The industry‑first commercial tool creates film‑grade audio with just one prompt. It eliminates cumbersome audio‑engineering work. Creators generate publish‑ready radio dramas, podcasts and branded audio easily, shifting from a simple voice‑generator to an AI audio director. It serves audiobooks, serialized episodes and commercial audio for high‑quality narrative‑driven production.
Seed Audio 1.0
Seed Audio 1.0 is ByteDance's audio generation model for producing speech from text prompts, with optional reference audio, speaker, or image inputs. It is exposed on AtlasCloud through the standard asynchronous audio generation API.
Highlights
- Text-to-speech generation: Convert text prompts into speech audio.
- Reference audio control: Provide up to three reference audios or a speaker ID to guide the voice, tone, or delivery. Refer to them in the prompt using the upstream placeholder tokens
@audio1,@audio2, and@audio3. - Reference image control: Provide one reference image to guide the generated audio style or character context.
- Reference exclusivity: Each reference item must contain exactly one of
speaker,audio_url,audio_data,image_url, orimage_data. The upstream API rejects mixed audio + image references in the same request. - Audio format control: Generate
mp3,wav,pcm, orogg_opus. - Sample rate control: Choose common output sample rates from
8000to48000. - Speech controls: Adjust pitch, speech speed, and loudness with optional rate parameters.
Parameters
| Parameter | Required | Description |
|---|---|---|
model | Yes | Use bytedance/seed-audio-1.0. |
text | Yes | Text prompt to synthesize into speech. |
references | No | Optional array of reference inputs. Use audio_url, audio_data, or speaker for audio/voice references; use image_url or image_data for image references. Do not mix image references with audio or speaker references. |
format | No | Output audio format. Default: mp3. |
sample_rate | No | Output sample rate. Default: 24000. |
pitch_rate | No | Pitch adjustment. Default: 0. |
speech_rate | No | Speech speed adjustment. Default: 0. |
loudness_rate | No | Loudness adjustment. Default: 0. |
Example Request
Full-featured example with a reference audio and all tunable controls:
{ "model": "bytedance/seed-audio-1.0", "text": "Use the voice and delivery of @audio1 and say in natural, clear English with a light broadcast tone: Welcome to Seed Audio. This is the most complete reference example.", "references": [ { "audio_url": "https://static.atlascloud.ai/model/example/bytedance-seed-audio-1.0.mp3" } ], "format": "mp3", "sample_rate": 44100, "pitch_rate": 2, "speech_rate": 15, "loudness_rate": 10 }
Text-only example:
{ "model": "bytedance/seed-audio-1.0", "text": "Hello, this is a Seed Audio text-to-speech test.", "format": "mp3", "sample_rate": 24000, "pitch_rate": 0, "speech_rate": 0, "loudness_rate": 0 }
With a reference audio:
{ "model": "bytedance/seed-audio-1.0", "text": "Use the voice and delivery of @audio1 and say: The city sounds especially quiet today.", "references": [ { "audio_url": "https://static.atlascloud.ai/model/example/bytedance-seed-audio-1.0.mp3" } ], "format": "mp3", "sample_rate": 24000 }
With a reference image:
{ "model": "bytedance/seed-audio-1.0", "text": "Using the mood and character context from the reference image, say in a bright, youthful tone: After the rain stopped, the street lit up again.", "references": [ { "image_url": "https://static.atlascloud.ai/uploads/models/ebeecbb1-1904-464c-ad24-6a631fa83ab6.png" } ], "format": "mp3", "sample_rate": 24000 }
Pricing
Seed Audio 1.0 is billed by input text length.
| Unit | Price |
|---|---|
| Per 1,000 characters | $0.015 |




