What languages does it support?

English and Chinese at launch, with wider language coverage on ByteDance's roadmap.

How long can the audio be?

Up to two minutes of output per generation. You are billed on the actual length produced, so short lines cost less.

NEW

Seed Audio 1.0

Clone a voice from a clip, or generate a whole scene of audio in one pass.

Seed Audio 1.0 is ByteDance's multimodal audio model, from the team behind Seedance. It does two jobs in one endpoint. Run it as clean text-to-speech for narration, or as cinematic scene audio that layers multi-role dialogue, sound effects, ambience, and music together. The standout is zero-shot voice cloning: attach up to three reference clips and Seed Audio matches each voice, so a character sounds the same across every line. English and Chinese today.

Try these prompts

“A calm narrator introduces a new range of timber-framed homes, warm and unhurried.”“Late-night convenience store. Fridge hum, a door chime, two friends deciding what to buy.”“@Audio1 reads the opening line of the brand film, confident and bright.”“Rain on a city street at night, distant traffic, a single set of footsteps approaching.”“Two characters argue across a kitchen table, tension rising, plates clinking.”“A market at dawn: stalls setting up, gulls overhead, a vendor calling out prices.”

How it works

Describe your vision

Type a detailed prompt or upload a reference sketch, photo, or mood board.

Choose your settings

Pick your resolution and aspect ratio. See the credit cost before you generate.

Generate in seconds

Your image is delivered in seconds. Download, iterate, or pipe into video.

Ready to create with Seed Audio 1.0?

Jump into the Studio and start generating. Plans from £10/month.

Voice cloning and scene audio from one model

Seed Audio 1.0 brings the Seedance team's reference-driven approach to sound. Attach up to three voice clips and reference them directly in your prompt, and the model holds each speaker's identity across every line it generates. That keeps a narrator, a character, or a brand voice consistent without booking the same person twice.

Beyond speech, Seed Audio composes. Ask for a late-night convenience store and you get the hum of the fridge, the door chime, footsteps, and two voices talking, all balanced in a single pass. For motion work, product films, and social edits, that collapses a sound-design session into one prompt.

Use it as plain text-to-speech when you just need narration, or open it up to full cinematic audio when the scene calls for it. Output runs up to two minutes per pass at your choice of sample rate and format.

Clone up to three voices

Attach reference clips of up to thirty seconds each and call them in your prompt as the first, second, and third voice. Seed Audio matches the timbre and delivery, so casts stay consistent shot to shot.

Speech or full scene

Switch between clean text-to-speech with no background, and cinematic audio that layers dialogue, effects, ambience, and music. One model, two modes, no extra passes.

Frequently asked

Questions about Seed Audio 1.0.

Attach up to three audio clips of up to thirty seconds each, then reference them in your prompt as the first, second, or third voice. Seed Audio matches each one so the same speaker carries across every line.

Built differently

Why Stensyl?.

Because creative work doesn't live in one box. A real project spans research, writing, image, video, 3D, motion graphics, editing, audio, and a way to publish it all. Stensyl puts every piece under one roof: dedicated studios for Film, Graphics, Canvas, 3D, 3D Worlds, Motion, Editing, Web, Social, and App, plus Generate for one-shot work, Projects to keep everything tied together, Workflows for repeatable pipelines, Research backed by Perplexity, and Write for proper documents. One login, one credit balance, one bill, one place where your work actually compounds. You stop paying five subscriptions for tools that don't talk to each other.

Yes, genuinely. Sign up with no card and you get 150 credits to spend across the platform, plus one free video render to land your first result. It's a real free account, not a time-limited trial: the credits don't expire on a countdown, you just use them when you like. Every model is included on the free tier, the only limit is your credit balance, not access. When you've used them up, top up or pick a plan, no card until you choose to.

Ready to create with Seed Audio 1.0?

Professional audio generation. Plans from £10/month.

Works well with

MiniMax Speech 2.6

300+ voices, 30+ languages.

ElevenLabs Audio

49 voices plus sound effects.

Stable Audio 2.5

SFX and ambient up to 190 seconds.

MiniMax Music

Add original music too.