Seed Audio 1.0
ข้อความเป็นเสียงพูด

Seed Audio 1.0 API by ByteDance

bytedance/seed-audio-1.0
Seed-audio-1.0

Doubao‑Audio‑Generate‑1.0 is Doubao Voice’s next‑generation audio‑generation engine. The industry‑first commercial tool creates film‑grade audio with just one prompt. It eliminates cumbersome audio‑engineering work. Creators generate publish‑ready radio dramas, podcasts and branded audio easily, shifting from a simple voice‑generator to an AI audio director. It serves audiobooks, serialized episodes and commercial audio for high‑quality narrative‑driven production.

Seed Audio 1.0

Seed Audio 1.0 is ByteDance's audio generation model for producing speech from text prompts, with optional reference audio, speaker, or image inputs. It is exposed on AtlasCloud through the standard asynchronous audio generation API.

Highlights

  • Text-to-speech generation: Convert text prompts into speech audio.
  • Reference audio control: Provide up to three reference audios or a speaker ID to guide the voice, tone, or delivery. Refer to them in the prompt using the upstream placeholder tokens @audio1, @audio2, and @audio3.
  • Reference image control: Provide one reference image to guide the generated audio style or character context.
  • Reference exclusivity: Each reference item must contain exactly one of speaker, audio_url, audio_data, image_url, or image_data. The upstream API rejects mixed audio + image references in the same request.
  • Audio format control: Generate mp3, wav, pcm, or ogg_opus.
  • Sample rate control: Choose common output sample rates from 8000 to 48000.
  • Speech controls: Adjust pitch, speech speed, and loudness with optional rate parameters.

Parameters

ParameterRequiredDescription
modelYesUse bytedance/seed-audio-1.0.
textYesText prompt to synthesize into speech.
referencesNoOptional array of reference inputs. Use audio_url, audio_data, or speaker for audio/voice references; use image_url or image_data for image references. Do not mix image references with audio or speaker references.
formatNoOutput audio format. Default: mp3.
sample_rateNoOutput sample rate. Default: 24000.
pitch_rateNoPitch adjustment. Default: 0.
speech_rateNoSpeech speed adjustment. Default: 0.
loudness_rateNoLoudness adjustment. Default: 0.

Example Request

Full-featured example with a reference audio and all tunable controls:

{ "model": "bytedance/seed-audio-1.0", "text": "Use the voice and delivery of @audio1 and say in natural, clear English with a light broadcast tone: Welcome to Seed Audio. This is the most complete reference example.", "references": [ { "audio_url": "https://static.atlascloud.ai/model/example/bytedance-seed-audio-1.0.mp3" } ], "format": "mp3", "sample_rate": 44100, "pitch_rate": 2, "speech_rate": 15, "loudness_rate": 10 }

Text-only example:

{ "model": "bytedance/seed-audio-1.0", "text": "Hello, this is a Seed Audio text-to-speech test.", "format": "mp3", "sample_rate": 24000, "pitch_rate": 0, "speech_rate": 0, "loudness_rate": 0 }

With a reference audio:

{ "model": "bytedance/seed-audio-1.0", "text": "Use the voice and delivery of @audio1 and say: The city sounds especially quiet today.", "references": [ { "audio_url": "https://static.atlascloud.ai/model/example/bytedance-seed-audio-1.0.mp3" } ], "format": "mp3", "sample_rate": 24000 }

With a reference image:

{ "model": "bytedance/seed-audio-1.0", "text": "Using the mood and character context from the reference image, say in a bright, youthful tone: After the rain stopped, the street lit up again.", "references": [ { "image_url": "https://static.atlascloud.ai/uploads/models/ebeecbb1-1904-464c-ad24-6a631fa83ab6.png" } ], "format": "mp3", "sample_rate": 24000 }

Pricing

Seed Audio 1.0 is billed by input text length.

UnitPrice
Per 1,000 characters$0.015

API เดียวสำหรับ AI สื่อทุกประเภท

สำรวจโมเดลทั้งหมด

Join our Discord community

Join the Discord community for the latest model updates, prompts, and support.