Model Showcases

Runway Act Two Review: AI Motion Direction for Creatives.

By Adam Morgan15 May 202610 min read
Runway Act Two Review: AI Motion Direction for Creatives

Runway Act Two promises director-level control over AI video. Here's what it actually delivers for film, set, and motion designers.

What Act Two Actually Does (Beyond the Marketing)

Article illustration

Runway's Act Two is not a text-to-video tool in the way most creatives currently understand that term. It sits inside Runway's Gen-4 system and works as a performance transfer layer: you supply a driving video of a human performance, and Act Two maps that facial expression, mouth movement, and gesture data onto a character reference. The output is generated video shaped by directorial intent, not just a visual response to a written prompt.

That distinction matters enormously. Standard text-to-video tools ask: what should this look like? Act Two asks: how should this character move, and how should the camera feel around them? The shift is from visual description to directorial instruction.

In practice, Act Two accepts a character reference image or video alongside a motion prompt. You can specify camera behaviour using cinematographic language: a slow dolly, a rack focus pull, an establishing wide. Independent testing reports output at up to 10 seconds of footage, at resolutions of 1280×768 or 768×1280, running at 24 fps. Render times reported in third-party reviews sit around 45 seconds for five seconds of footage and roughly 90 seconds for a ten-second clip. Those figures come from external test environments rather than a primary Runway specification sheet, so treat them as indicative rather than guaranteed.

What Act Two does well, according to independent reviewers: character consistency across a clip, performance transfer, and cinematic motion quality. Where it struggles: hand and body distortion on complex gestures, geometry collapse in detailed interior scenes, and imperfect replication of subtle facial expressiveness. These are not edge cases. They are consistent failure modes to plan around.

One boundary worth drawing clearly before going further: Act Two is a previs and motion-direction layer. It produces footage clips. It does not replace compositing, colour grading, or editorial tools. No source found positions it as a post-production suite, and using it as one will lead to frustration. Its job is to show how something moves and feels on camera, not to deliver broadcast-ready material.

Act Two is a directorial tool, not a generative one in the conventional sense. The input is a performance; the output is a camera-driven interpretation of that performance applied to a character reference.

Film and Set Designers: Pre-Visualisation Without a Shoot

Article illustration

For film and set designers, the most defensible use case for Act Two is pre-visualisation of camera movement through a space before construction locks the budget. That is an inference drawn from Act Two's motion-transfer and camera-consistency capabilities, not a claim Runway makes explicitly. But the logic holds, and it reflects how the tool behaves in practice.

Consider a period interior: a drawing room set with practical firelight, dressed with period furniture and deep shadow. Before a single flat is built, a production designer can generate a slow dolly through a reference scene to test whether the sightlines work, whether the practical light source reads correctly from a given angle, and whether the camera has a natural path through the space. Three takes, three camera angles, all reviewed before the construction budget is committed.

Act Two holds spatial logic reasonably well on simpler compositions. Where it breaks down is on complex geometry: detailed cornicing, layered architectural depth, or any scene where the camera move reveals multiple receding planes simultaneously. Geometry collapse and motion blur artefacts are the most commonly reported failure modes in interior scenes. For broad compositional testing, that is acceptable. For sightline precision in a detailed set, it is a limitation worth acknowledging upfront.

The useful comparison here is with storyboards. A storyboard communicates composition and sequence: where the subject is in frame, what the editor has to cut to, what the emotional beat of a shot reads as. Act Two communicates something different: motion and perceived camera feel. Both are necessary. Neither replaces the other. A Stensyl Storyboards session produces a clear, shareable scene-by-scene visual record that a director, DP, and production designer can annotate together. An Act Two clip shows whether the camera move through that storyboarded scene feels right in motion.

The practical advantage is iteration speed. Reviewing three camera approaches in Act Two takes roughly five minutes of generation time. Physically redressing a set to test the same options takes hours. For pre-production conversations between directors and production designers, that speed difference is significant.

Storyboards show composition. Act Two shows motion. A complete pre-vis pass uses both, not one instead of the other.

Motion Designers: When Generative Video Meets Motion Graphics

Motion designers work in a different register to film creatives, and Act Two's relevance to them is correspondingly narrower but still real. The strongest use case is generating live-action-style atmospheric footage and background plates to composite against motion graphics layers built in After Effects or Cinema 4D. Act Two produces footage. It does not produce native motion-graphics timelines, keyframe data, or anything a motion designer can open directly in their preferred tool. The handoff is always via exported clip.

Where that matters: if the brief calls for a brand film that layers kinetic type over a moving environmental plate, Act Two can generate the plate. A moody rack-focus through an abstract urban environment, a slow-motion particle haze, a drifting atmospheric exterior. These are tasks where Act Two's cinematographic strengths align with what motion designers actually need from a footage source.

Where it does not help: kinetic typography environments that need precise spatial control, particle systems built to specific timing, or any scene where the geometry needs to hold exactly over time. Act Two's reported drift and distortion on complex motion means it cannot be trusted for shots where a background element needs to track precisely against a foreground motion-graphics element across the full clip duration. That kind of work still belongs in Cinema 4D or a motion-capture pipeline.

Stensyl's Motion surface, built on Remotion, sits at the opposite end of this spectrum. It handles keyframe-driven motion graphics with a programmatic export pipeline. The two tools are not competing for the same brief. Act Two is texture and atmosphere. Stensyl's Motion surface is controlled, repeatable, timeline-precise output. A motion designer working on a complex brand campaign might use both: Act Two for the atmospheric plate, Stensyl's Motion surface for the type and graphic layers that sit on top.

Brief type Act Two Stensyl Motion surface
Atmospheric background plate Strong Not designed for this
Kinetic typography Weak Strong
Camera-move previs Strong Not designed for this
Programmatic motion export Not applicable Strong
Particle-heavy controlled scene Unreliable Strong

The practical verdict: Act Two earns its place in the motion designer's toolkit as a texture and atmosphere generator, not as a replacement for keyframe-driven motion work. Treat its output as source material, not finished deliverable.

Prompt Craft: How to Direct Act Two Like a Cinematographer

Article illustration

The prompts that produce consistent results in Act Two share a clear structure: subject, environment, camera movement, light quality, and pacing as distinct variables, specified separately rather than blended into a single vague sentence. Consumer-language prompts ("show a cool room with some camera movement") consistently underperform compared to prompts that use recognisable cinematographic vocabulary.

Here is the same scene rewritten across three levels of specificity:

Vague prompt

"A product reveal on a dark background with some movement."

Improved prompt

"A slow push-in towards a glass perfume bottle on a black reflective surface. Soft key light from screen-left. Shallow depth of field. 24fps."

Precise, cinematographic prompt

"Motivated low-angle dolly towards a glass perfume bottle on a lacquered black surface. Practical reflection visible in bottle surface. Soft diffused key from camera-left at 45 degrees, no fill, deep shadow on camera-right. Rack focus from bottle base to bottle neck mid-move. Measured pace, no motion blur. 24fps."

The difference in output quality between the first and third prompt is substantial. Act Two responds to the same vocabulary a DP would use in a shot list. The more precisely you specify the camera's intention, the less interpolation the model has to do, and the more predictable the result.

Apply the same approach to a set walkthrough or an abstract brand sequence. For a set walkthrough: name the architectural type, specify whether the camera leads or follows the space, describe the light source (practical or motivated), and state whether you want depth-of-field shift. For an abstract brand sequence: anchor the motion to a recognisable spatial logic (moving through a tunnel of light, orbiting a geometric form) rather than describing an abstract mood.

The failure modes to anticipate and prompt against:

  • Motion blur artefacts: Specify a measured or slow pace. Fast camera moves increase the likelihood of smearing.
  • Character drift across cuts: Use a consistent character reference and avoid prompting for mid-clip appearance changes.
  • Geometry collapse on complex interiors: Simplify the spatial description. The more receding planes the model has to hold simultaneously, the higher the risk of collapse.
  • Hand and body distortion: Limit prompts to upper-body or face-forward compositions where possible. Full-body gesture sequences are the most commonly reported distortion trigger.

A repeatable template for film and motion designers:

[Camera movement] towards/through [subject or environment]. [Light source and quality]. [Depth of field behaviour]. [Character or subject framing]. [Pace descriptor]. 24fps.

That template will not guarantee perfect output, but it will consistently outperform unstructured prompts and give you a repeatable basis for iteration.

Act Two responds to cinematographic vocabulary the same way a DP responds to a shot list. Vague prompts produce vague results. Specify the camera's intention, not just the scene's appearance.

Act Two Inside a Real Creative Workflow

Act Two is a single node in a larger pipeline. Treating it as a standalone studio is where most creative teams hit friction. The practical workflow around it looks like this:

  1. Brief intake in Stensyl's Write surface. Draft the scene or sequence brief using the multi-model writing tools. Establish the directorial intent, the key performance beats, and the spatial requirements before touching any generation tool.
  2. Visual reference gathering on Stensyl's Moodboards. Pull reference photography, film stills, or existing footage that establishes the visual language. This stage directly informs the Act Two prompt vocabulary: if the reference is Vilmos Zsigmond's work on McCabe and Mrs. Miller, your prompt language changes accordingly.
  3. Scene boarding on Stensyl's Storyboards. Map the shot sequence. Establish compositions, emotional beats, and the editorial logic before generating any footage. Act Two cannot tell you what should happen in a sequence. That decision belongs here.
  4. Act Two generation, externally. With a clear brief, reference set, and storyboard sequence, prompt Act Two with precision. Run iterations. Expect three to five passes per shot to find a usable take.
  5. Footage review and frame editing in Stensyl's Editing surface. Once you have usable footage, bring it into Stensyl's frame-level editing studio for colour adjustment, crop, and compositing prep before it moves into the broader post pipeline.

On credit and time cost: three to five iteration passes per shot at roughly 90 seconds of render time per ten-second clip is a realistic minimum for a pre-vis pass. For a six-shot sequence, budget for 30 to 45 minutes of generation time and multiple review loops. That is still significantly faster than any equivalent physical pre-vis approach, but it is not instantaneous, and it should be reflected in project scheduling.

Where Stensyl's Ray assistant earns its place in this workflow is at the model-selection stage. Before committing Act Two to a brief, it is worth asking Ray whether the specific task calls for Act Two's performance-transfer capability or whether a simpler video generation approach via Stensyl's Generate surface would serve the brief faster. For a motion designer needing a 30-second atmospheric plate rather than a character-driven performance clip, the Generate surface may be the more efficient route. Ray is designed exactly for that decision point.

For simpler briefs, Stensyl's Generate surface handles video generation without the complexity of Act Two's performance-transfer input requirements. If the brief is "give me a ten-second abstract loop of light moving through water for a title card", there is no reason to build a full Act Two pipeline around it. Match the tool complexity to the brief complexity.

Verdict: Where Act Two Earns Its Place and Where It Doesn't

Three disciplines see the clearest return from Act Two given its current capabilities:

  • Film pre-visualisation: Camera move testing before physical set construction. The speed advantage over physical pre-vis is substantial, and the output fidelity is sufficient for directorial decision-making at the pre-build stage.
  • Set design sightline testing: Evaluating how a dressed space reads from specific camera positions before the construction budget locks. Act Two handles simple spatial geometry reliably enough for this purpose.
  • Atmospheric plate generation for motion compositing: Producing live-action-style footage that motion designers composite against in After Effects or Cinema 4D. The output quality on atmospheric and environmental prompts is Act Two's most consistent strength.

Three areas where current limitations make Act Two the weaker choice:

  • Tight product renders: Geometry instability and distortion make Act Two unreliable for product design briefs that require precise surfaces, reflections, and spatial accuracy. A product designer working on a vehicle colourway or a consumer electronics reveal needs deterministic output that Act Two does not currently provide.
  • Exhibition walkthroughs: Exhibition designers presenting spatial experiences to clients need architectural accuracy and consistent geometry across a full walkthrough sequence. The geometry collapse reported on complex interiors makes Act Two a risky choice here.
  • Social content production: For content and social teams where speed and volume matter more than cinematic control, Act Two's iteration overhead and render times make it a poor fit. Tools built for fast-turnaround asset production are a better match for that brief.

Act Two in the current AI video landscape is commonly compared with tools like Luma, Kling, and Pika on axes of cinematic quality, motion control, and character consistency. Its differentiator is the performance-transfer mechanism: that directorial layer is not standard across the field. Whether that differentiator justifies the workflow overhead depends entirely on whether directorial language and camera intent are central to the brief.

If the brief is "show me how this character moves and how the camera feels around them", Act Two is the right reach. If the brief is "give me a pixel-accurate render of this space" or "produce 20 social assets by Thursday", it is not.

The capability that would most shift this verdict in a future review is improved geometry stability on complex interiors and human body motion. That single improvement would open up exhibition design walkthroughs and full-body performance transfers as credible use cases. Until then, Act Two is at its best from the shoulders up and from the camera's point of view: a tool for directors and performance-driven storytellers, not for spatial or product precision work.

Use Act Two when the brief is directorial. When the brief is spatial, accurate, or fast, reach for a different tool. The distinction is not a weakness — it is a workflow decision.

Keep reading.

Try Stensyl for yourself

Image, video, 3D, chat, and document drafting. Every AI model, one studio. Plans from £10/month.