
ByteDance's flagship. 15s video with native audio in a single generation.
ByteDance's Multi-Modal Diffusion Transformer. Dual-branch architecture generates video and audio simultaneously with millisecond synchronisation. Up to 15 seconds at 720p with native stereo audio. Multi-modal reference system accepts up to 9 reference images for character and scene consistency. Flow Matching framework delivers 30% faster generation than v1.5.

“Armoured hero landing on a rain-soaked rooftop, shockwave cracking concrete, cape billowing, lightning illuminating a neon cityscape behind, dramatic low angle, volumetric rain, Unreal Engine 5 cinematic”

“High-speed car chase along a sun-drenched Miami coastal highway, matte black supercar drifting sideways through an intersection, tyre smoke, palm trees blurring, helicopter tracking shot, golden hour lens flare”

“Anime warrior standing on a floating crystal platform above clouds, glowing energy sword raised, a colossal dragon emerging from the storm below, cel-shaded rendering, volumetric god rays, epic fantasy”

“Cyberpunk samurai walking through a neon-drenched market street in futuristic Tokyo, holographic ads floating overhead, rain cascading off a translucent umbrella, katana on back, atmospheric fog, chrome reflections”

“Open-world gameplay shot: figure on a motorcycle cresting a hill overlooking a vast coastal city at sunset, ocean to the horizon, winding highway below, photorealistic, cinematic colour grading”

“Colossal mech robot emerging from stormy ocean waves, searchlights cutting through spray and fog, fighter jets banking away, lightning illuminating armour plating, dramatic low angle, IMAX scale, teal and orange”
Type a detailed prompt describing the video you want, or upload a reference image as a starting frame.
Pick your resolution and duration. See the credit cost before you generate.
Your video is ready in 1-3 minutes. Download, iterate, or extend the sequence.
Jump into the Studio and start generating. Plans from £10/month.
Choose a PlanSeedance 2.0 is ByteDance's flagship video generation model. It uses a dual-branch Multi-Modal Diffusion Transformer: one branch generates video frames, the other generates audio waveforms, connected by a cross-attention bridge that synchronises them at every step. The result is video and audio created together in a single pass, not audio bolted on after the fact.
The multi-modal reference system is the standout capability. Feed Seedance 2.0 up to 9 reference images for character appearance and scene consistency. Use @1, @2 etc. in your prompt to direct specific references: '@1 walks through the market while @2 watches from a balcony.' The model decouples content from motion, letting you combine a character from one reference with camera movement from another. This is directing, not prompting.
Output reaches 15 seconds at up to 1080p with 24fps and native dual-channel stereo audio. Flow Matching replaces traditional Gaussian diffusion with a more direct mathematical path from noise to output, delivering 30% faster generation than Seedance 1.5 while improving quality. 480p and 720p options are available for faster iteration at lower credit cost.
Most AI video models generate silent video and add audio separately. Seedance 2.0 generates both simultaneously. Lip movements map to phonemes, sound effects land on the correct frame, music follows visual rhythm natively. For dialogue, product narrations, and music-driven content, this eliminates the post-production audio sync step entirely.
Upload up to 9 reference images for character appearance and scene consistency. Use @1, @2 in your prompt to call specific references. Seedance 2.0 combines them into a coherent generation, decoupling content from motion so you can direct each reference independently.
Seedance 2.0 has the strictest content filters of any video model. ByteDance's filters reject roughly 1 in 3 prompts, including many innocent ones. Cinematic language (camera angles, lighting, lens specs) passes more reliably than plain descriptions. If a generation fails, Stensyl explains why and offers one-click retry with Kling O3 or Veo 3.1. That is the advantage of a multi-model platform.
A small indie studio building creative tools the way they should be built. No VC theatre, no funnel games, no faceless support.
Professional video generation. Plans from £10/month.