Seed Audio 1.0 example output
NEW

Seed Audio 1.0

Clone a voice from a clip, or generate a whole scene of audio in one pass.

Seed Audio 1.0 is ByteDance's multimodal audio model, from the team behind Seedance. It does two jobs in one endpoint. Run it as clean text-to-speech for narration, or as cinematic scene audio that layers multi-role dialogue, sound effects, ambience, and music together. The standout is zero-shot voice cloning: attach up to three reference clips and Seed Audio matches each voice, so a character sounds the same across every line. English and Chinese today.

How it works

01

Describe your vision

Type a detailed prompt or upload a reference sketch, photo, or mood board.

02

Choose your settings

Pick your resolution and aspect ratio. See the credit cost before you generate.

03

Generate in seconds

Your image is delivered in seconds. Download, iterate, or pipe into video.

Ready to create with Seed Audio 1.0?

Jump into the Studio and start generating. Plans from £10/month.

Voice cloning and scene audio from one model

Seed Audio 1.0 brings the Seedance team's reference-driven approach to sound. Attach up to three voice clips and reference them directly in your prompt, and the model holds each speaker's identity across every line it generates. That keeps a narrator, a character, or a brand voice consistent without booking the same person twice.

Beyond speech, Seed Audio composes. Ask for a late-night convenience store and you get the hum of the fridge, the door chime, footsteps, and two voices talking, all balanced in a single pass. For motion work, product films, and social edits, that collapses a sound-design session into one prompt.

Use it as plain text-to-speech when you just need narration, or open it up to full cinematic audio when the scene calls for it. Output runs up to two minutes per pass at your choice of sample rate and format.

Clone up to three voices

Attach reference clips of up to thirty seconds each and call them in your prompt as the first, second, and third voice. Seed Audio matches the timbre and delivery, so casts stay consistent shot to shot.

Speech or full scene

Switch between clean text-to-speech with no background, and cinematic audio that layers dialogue, effects, ambience, and music. One model, two modes, no extra passes.

Frequently asked

Questions about Seed Audio 1.0.

Attach up to three audio clips of up to thirty seconds each, then reference them in your prompt as the first, second, or third voice. Seed Audio matches each one so the same speaker carries across every line.
Built differently

Why Stensyl?.