
Clone a voice from a clip, or generate a whole scene of audio in one pass.
Seed Audio 1.0 is ByteDance's multimodal audio model, from the team behind Seedance. It does two jobs in one endpoint. Run it as clean text-to-speech for narration, or as cinematic scene audio that layers multi-role dialogue, sound effects, ambience, and music together. The standout is zero-shot voice cloning: attach up to three reference clips and Seed Audio matches each voice, so a character sounds the same across every line. English and Chinese today.
Type a detailed prompt or upload a reference sketch, photo, or mood board.
Pick your resolution and aspect ratio. See the credit cost before you generate.
Your image is delivered in seconds. Download, iterate, or pipe into video.
Jump into the Studio and start generating. Plans from £10/month.
Seed Audio 1.0 brings the Seedance team's reference-driven approach to sound. Attach up to three voice clips and reference them directly in your prompt, and the model holds each speaker's identity across every line it generates. That keeps a narrator, a character, or a brand voice consistent without booking the same person twice.
Beyond speech, Seed Audio composes. Ask for a late-night convenience store and you get the hum of the fridge, the door chime, footsteps, and two voices talking, all balanced in a single pass. For motion work, product films, and social edits, that collapses a sound-design session into one prompt.
Use it as plain text-to-speech when you just need narration, or open it up to full cinematic audio when the scene calls for it. Output runs up to two minutes per pass at your choice of sample rate and format.
Attach reference clips of up to thirty seconds each and call them in your prompt as the first, second, and third voice. Seed Audio matches the timbre and delivery, so casts stay consistent shot to shot.
Switch between clean text-to-speech with no background, and cinematic audio that layers dialogue, effects, ambience, and music. One model, two modes, no extra passes.
Professional audio generation. Plans from £10/month.