Workflow Guides

How to Prompt a Cinematic Short: The Librarian.

By Adam Morgan16 April 20267 min read
How to Prompt a Cinematic Short: The Librarian

A production breakdown for a 4-sequence cinematic short. The quiet hero. The colossal threat. The 5-pillar prompt structure that holds it all together.

A retired librarian punches a forty-storey biomechanical horror through a clock tower. This is the premise of The Librarian, a cinematic short built around one principle: the wider the gap between expectation and reality, the more cinematic the payoff. This guide is the full production breakdown, from character reference sheets to the final frame, and the prompt structure that makes the whole thing work.

The unsuspected hero archetype lives or dies on contrast. Establish normality with obsessive detail. Then obliterate it.

The Concept

The story follows Arthur Pembrook, a 78-year-old retired librarian, living alone in a quiet terraced house in northern England. Tweed jacket. Reading glasses on a chain. A cat named Tolstoy. He is the last person on earth you'd expect to hold his own against The Archivist, a 40-storey biomechanical colossus that tears through his front wall in the opening minute.

That's exactly why it works. Every prompt in this production is built to establish calm so thoroughly that its destruction lands like a physical blow. The model has to believe the tea is steaming, the fire is hissing, the cat is purring. Only then does the breach feel earned.

The 5-Pillar Prompt Structure

Every shot in this production runs through the same five pillars. Trying to cram a full scene into a single paragraph is the fastest way to lose AI video models — they pick what to follow and what to ignore. Layering information by category gives you predictable results across sequences.

  • Style & Atmosphere — the visual language. Ken Loach domestic realism for the calm. Anime-inflected cinematic action for the climax. These tags set the camera grammar before anything else.
  • Narrative Overview — the beat of the shot. What happens, in one clear sentence. One generation equals one moment.
  • Dynamic Description — the action. Broken into phases: establish, disrupt, resolve. Each phase is its own beat.
  • Static Description — the visuals. What the camera sees that isn't moving. Match reference images by name: "match Image Ref A exactly."
  • Audio & Sound Design — yes, even when you plan to add sound in post. Describing the audio changes how AI video models render motion and pacing.

Reference Images Anchor Everything

Before a single video clip is generated, this production needs four reference images locked in. Generate them in the Image Studio using a consistent seed and style. These are your anchors for every subsequent shot.

If it needs to look the same in two clips, it needs a reference image. Two identical prompts with no references will produce two completely different characters.

Arthur — Image Ref A

Wiry build, silver hair parted neatly, round tortoiseshell glasses, hearing aid in the left ear. Brown tweed jacket over a cream cable-knit jumper. Corduroy trousers. Scuffed brown brogues. A hardback book in his right hand. Face: kind, weathered, deep smile lines. Posture slightly stooped. The face and clothing of a man you'd sit next to on a northern bus and never remember.

The Archivist — Image Ref B

A 40-storey biomechanical entity. Vaguely humanoid but wrong. Elongated torso, four asymmetric limbs, a head that resembles a cathedral dome split open. Surface texture: corroded bronze plating fused with organic sinew and bioluminescent amber veins. Moves with tectonic weight. Emits a low, resonant hum like a pipe organ underwater.

How to Prompt a Cinematic Short: The Librarian

Arthur's Terraced House — Image Ref C

Small two-up-two-down Victorian terrace. Red brick, bay window with net curtains, green painted front door with a brass letterbox. Interior: floor-to-ceiling bookshelves in every room. A worn leather armchair by a gas fire. Warm tungsten lighting. A tabby cat asleep on a stack of encyclopaedias. Tea-stained mugs on every surface.

The Town — Image Ref D

Generic northern English market town. Cobblestone high street, Victorian civic buildings, a prominent clock tower, terraced housing climbing up hillsides. Overcast sky. Wet streets reflecting shopfront lights. A town that looks like it hasn't changed in 50 years. Until today.

Sequence 01 — The Silence Before

The opening sequence does one job: establish normality so thoroughly the audience feels every second of that nothing. Arthur reads. He makes tea. He speaks to the cat. Thirty seconds of obsessive domestic detail, then the ground begins to tremor. Tea ripples in the cup. A framed photo rattles on the mantelpiece. The cat lifts its head. Then the entire front wall is torn outward in a single violent motion and The Archivist's hand reaches through the gap.

The prompt for the establishing shot is a study in restraint. Wide locked-off interior shot. Elderly man in worn leather armchair. Brown tweed jacket, round glasses, reading a hardback book. Tabby cat curled on chair arm. Tea steaming on a side table. Warm tungsten practicals. Muted earth tones. Shallow depth of field. No camera movement. Perfect stillness. The point is the absence of event.

The breach prompt reverses every atmospheric flag. Violent action. Handheld. Chaotic energy. Motion blur. Harsh daylight flooding a warm interior. Brick dust, shattered glass, splintered timber filling the frame. The tonal shift from "no camera movement" to "handheld, motion blur, chaotic" produces dramatic results because you're telling the model to change modes completely between shots.

AI video models respond well to extreme tonal contrast between sequential prompts. Give them a hard tonal pivot and they'll render it.

Sequence 02 — The Reckoning

The Archivist hurls Arthur across the town. He punches through the clock tower, collapses a row of Victorian shopfronts, and embeds himself in the facade of the town hall. Dust settles. He's sitting in a crater, still holding his book. He adjusts his glasses, looks back toward the creature, and says: "Terribly rude."

This is where the film pivots. Until this point the audience has no reason to think Arthur is anything other than a man about to die. The "Terribly rude" line, delivered at conversational volume with perfect composure, is the first signal that something is very wrong with their assumptions about this story.

Cinematographically, the sequence needs extreme kinetic realism. Harsh midday overcast light. High shutter speed. Camera conveying supersonic velocity through aggressive tracking shots and visible atmospheric distortion. The palette shifts from warm domestic tones to cold industrial greys. Think the plane catch from Superman Returns crossed with the freeway sequence from The Matrix Reloaded.

Sequence 03 — The Impossible Fight

Arthur launches himself back toward The Archivist and what follows is a full-scale aerial and ground-level battle. His movements are precise, efficient, almost academic. No wasted energy. Every strike calculated. The creature swings with seismic force; Arthur dodges with absurd agility. The fight escalates until he cracks the creature's bronze plating and drives it earthward with a single downward strike.

Style: anime-inflected cinematic action. Speed lines, impact flashes, exaggerated physics. The camera feels live and dangerous — long sweeping helicopter tracking shots for scale, snapping to shaky handheld ground-level shots for impact. High contrast lighting: dark storm clouds overhead, fire and bioluminescent glow from below. Every frame looks expensive.

The visual key for this sequence is Arthur's posture transformation. He's no longer stooped. Upright, precise, controlled. His tweed jacket billows. His movements are surgical, like he's been doing this for centuries and considers it mildly tedious. The Archivist progressively degrades — bronze plating cracking, bioluminescent veins flickering irregularly, amber fluid spraying outward. A machine losing power.

Sequence 04 — Quiet After Storm

The dust settles. Arthur retrieves his book from a surviving window ledge, walks home through the wreckage, and sits back down in his armchair — which is now exposed to the open sky, half the house missing around it. The cat returns. He makes another cup of tea. He opens his book.

The final frame is a visual mirror of Sequence 01's establishing shot. Same chair. Same man. Same composition. But every element has been transformed by context. The stillness that felt mundane now feels earned. The missing wall turns the room into an open stage, framed against a destroyed town at golden hour. Dust particles catch the warm directional light. Floral wallpaper meeting open sky. A reading lamp illuminating a room with no ceiling.

The final frame works because it's a mirror, not a sequel. You don't change the character. You change the audience's understanding of them.

The Production Workflow

The order of operations matters. References first, always. Generate Arthur as a character sheet in the Image Studio using Nano Banana Pro at 2K resolution. Same for The Archivist, the house, the town. Save everything to a Project folder. Only then do you open the Film Studio and start generating video clips.

Run 2–4 generations per shot. Pick the best 3–5 seconds from each. Assemble in post. Add sound design, colour grading, and titles. The model doesn't make the film. Your editing does.

The Takeaway

Cinematic AI film isn't about finding the perfect prompt. It's about building a system that produces usable, spliceable, narratively coherent footage across a whole sequence. References anchor identity. The 5-pillar structure anchors information. Extreme tonal shifts between sequential prompts give you dynamic range. The rest is editing.

Open the Film Studio, generate your character sheets first, and build the normality before you destroy it.

AI filmSeedanceKlingprompt guidecinematic

Keep reading.

Try Stensyl for yourself

Image, video, 3D, chat, and document drafting. Every AI model, one studio. Plans from £10/month.