Seedance 2.0 example output

Seedance 2.0

ByteDance's flagship. 15s video with native audio in a single generation.

ByteDance's Multi-Modal Diffusion Transformer. Dual-branch architecture generates video and audio simultaneously with millisecond synchronisation. Up to 15 seconds at 720p with native stereo audio. Multi-modal reference system accepts up to 9 reference images for character and scene consistency. Flow Matching framework delivers 30% faster generation than v1.5.

Example outputs

Seedance 2.0 example 1

Armoured hero landing on a rain-soaked rooftop, shockwave cracking concrete, cape billowing, lightning illuminating a neon cityscape behind, dramatic low angle, volumetric rain, Unreal Engine 5 cinematic

Seedance 2.0 example 2

High-speed car chase along a sun-drenched Miami coastal highway, matte black supercar drifting sideways through an intersection, tyre smoke, palm trees blurring, helicopter tracking shot, golden hour lens flare

Seedance 2.0 example 3

Anime warrior standing on a floating crystal platform above clouds, glowing energy sword raised, a colossal dragon emerging from the storm below, cel-shaded rendering, volumetric god rays, epic fantasy

Seedance 2.0 example 4

Cyberpunk samurai walking through a neon-drenched market street in futuristic Tokyo, holographic ads floating overhead, rain cascading off a translucent umbrella, katana on back, atmospheric fog, chrome reflections

Seedance 2.0 example 5

Open-world gameplay shot: figure on a motorcycle cresting a hill overlooking a vast coastal city at sunset, ocean to the horizon, winding highway below, photorealistic, cinematic colour grading

Seedance 2.0 example 6

Colossal mech robot emerging from stormy ocean waves, searchlights cutting through spray and fog, fighter jets banking away, lightning illuminating armour plating, dramatic low angle, IMAX scale, teal and orange

How it works

01

Describe your scene

Type a detailed prompt describing the video you want, or upload a reference image as a starting frame.

02

Choose your settings

Pick your resolution and duration. See the credit cost before you generate.

03

Generate your video

Your video is ready in 1-3 minutes. Download, iterate, or extend the sequence.

Ready to create with Seedance 2.0?

Jump into the Studio and start generating. Plans from £10/month.

Choose a Plan

Cinema-grade AI video with native audio.

Seedance 2.0 is ByteDance's flagship video generation model. It uses a dual-branch Multi-Modal Diffusion Transformer: one branch generates video frames, the other generates audio waveforms, connected by a cross-attention bridge that synchronises them at every step. The result is video and audio created together in a single pass, not audio bolted on after the fact.

The multi-modal reference system is the standout capability. Feed Seedance 2.0 up to 9 reference images for character appearance and scene consistency. Use @1, @2 etc. in your prompt to direct specific references: '@1 walks through the market while @2 watches from a balcony.' The model decouples content from motion, letting you combine a character from one reference with camera movement from another. This is directing, not prompting.

Output reaches 15 seconds at up to 1080p with 24fps and native dual-channel stereo audio. Flow Matching replaces traditional Gaussian diffusion with a more direct mathematical path from noise to output, delivering 30% faster generation than Seedance 1.5 while improving quality. 480p and 720p options are available for faster iteration at lower credit cost.

Audio and video in one pass

Most AI video models generate silent video and add audio separately. Seedance 2.0 generates both simultaneously. Lip movements map to phonemes, sound effects land on the correct frame, music follows visual rhythm natively. For dialogue, product narrations, and music-driven content, this eliminates the post-production audio sync step entirely.

Multi-modal references

Upload up to 9 reference images for character appearance and scene consistency. Use @1, @2 in your prompt to call specific references. Seedance 2.0 combines them into a coherent generation, decoupling content from motion so you can direct each reference independently.

Content filters: what to expect

Seedance 2.0 has the strictest content filters of any video model. ByteDance's filters reject roughly 1 in 3 prompts, including many innocent ones. Cinematic language (camera angles, lighting, lens specs) passes more reliably than plain descriptions. If a generation fails, Stensyl explains why and offers one-click retry with Kling O3 or Veo 3.1. That is the advantage of a multi-model platform.

Frequently asked

Questions about Seedance 2.0.

ByteDance's Multi-Modal Diffusion Transformer. Dual-branch architecture generates video and audio simultaneously with millisecond synchronisation. Up to 15 seconds at 720p with native stereo audio. Multi-modal reference system accepts up to 9 reference images for character and scene consistency. Flow Matching framework delivers 30% faster generation than v1.5.
Built differently

Why Stensyl?

A small indie studio building creative tools the way they should be built. No VC theatre, no funnel games, no faceless support.