HeyGen example output

HeyGen

Six HeyGen tools under one roof: trained avatars, talking photos, text-to-video, lipsync, and 179-language dubbing. One credit pool, one place to learn.

HeyGen pioneered AI talking avatars and still sets the benchmark, and the lineup now goes well beyond a single model. Stensyl integrates the full suite: Avatar V Digital Twin trained from your own footage, Talking Photo to make any portrait speak, Text to Video that scripts and assembles a clip from a prompt, Video Translate into 179 languages, and V3 Lipsync in Precision and Speed. Six tools, one credit pool, one studio.

Example outputs

HeyGen example 1

Train a Digital Twin from a 2-minute clip, then render a 15-second product launch teaser in your own voice and gestures.

HeyGen example 2

Make a generated character portrait talk: upload the image, type a 10-second script, pick a voice (Talking Photo).

HeyGen example 3

Turn a one-paragraph brief into a finished 30-second explainer with a presenter, script, and cuts (Text to Video).

HeyGen example 4

Translate a tutorial from English into Spanish, Hindi, and Japanese with the same presenter on screen (Video Translate).

HeyGen example 5

Dub a finished promo with a new voiceover using frame-accurate lip movement (V3 Lipsync Precision).

HeyGen example 6

Generate caption-tracked social cuts of a podcast clip for TikTok, Reels, and Shorts (V3 Lipsync Speed).

How it works

01

Describe your scene

Type a detailed prompt describing the video you want, or upload a reference image as a starting frame.

02

Choose your settings

Pick your resolution and duration. See the credit cost before you generate.

03

Generate your video

Your video is ready in 1-3 minutes. Download, iterate, or extend the sequence.

Ready to create with HeyGen?

Jump into the Studio and start generating. Plans from £10/month.

The full HeyGen suite, from trained avatars to one-prompt video.

HeyGen is the most respected name in AI talking-head video, and the lineup now reaches far beyond a single model. Avatar V Digital Twin trains a persistent avatar on 2 to 3 minutes of your own footage, learning your face, gestures, body language, and voice in one holistic model. Talking Photo brings a single portrait to life with a script and a voice. Text to Video turns a written brief into a finished, presenter-led cut. Video Translate re-voices and re-lip-syncs any clip into 179 languages. V3 Lipsync replaces the audio on existing footage with frame-accurate mouth movement. Stensyl integrates all of it under one roof.

Avatar V is the upgrade to HeyGen's earlier Avatar IV engine, and it is a real step up. Where the previous generation animated a likeness, V learns gestures, body language, and expression, so renders read like genuine footage of you rather than an animated photo, at effectively the same cost. The trained twin is persistent: build it once and it still works on every render months later. It auto-appears as a Cast member in Storyboards and Film Studio, slots into Canvas as an Avatar Video node, lives in Generate's Talking Avatar mode, and answers to Ray. You can also add new outfits to an existing twin without retraining from scratch.

Everything HeyGen is on every Stensyl plan, billed from one shared credit pool. The transform tools work on anything you bring: Video Translate is 8 credits per input second, V3 Lipsync Speed is 11 and Precision 21 credits per output second, and every lipsync render returns an SRT caption file mirrored into permanent storage. The generative tools bill per output second by duration: Talking Photo at 12 credits a second with no setup, Text to Video at 9. Avatar V renders are 80, 160, and 240 credits at the 5, 10, and 15 second buckets; training a twin is a one-time 425 credits and adding an outfit is 250, since both carry real cost on HeyGen's side.

Avatar V Digital Twin: trained on your own footage

Avatar V is HeyGen's flagship trained-avatar model and the successor to Avatar IV. Upload 2 to 3 minutes of footage of yourself talking naturally on camera and HeyGen builds a persistent model of your face, gestures, body language, and voice. Stensyl wraps this as the My Avatar feature: train once, render thousands of clips with your real presence and delivery across every studio without re-uploading anything. The V engine reads body language and expression the older IV engine flattened, so the output looks shot, not animated. Add new outfits to the same twin as your wardrobe grows, and the avatar you build today still works on every render six months from now.

Talking Photo: make any portrait speak

Talking Photo turns a single still into a talking video. Upload a portrait, generated character, mascot, or historical figure, type a script, pick a voice, and HeyGen produces a clip with lip-sync, facial movement, and natural gesture. No training, no setup fee, no trained twin required. It runs on the same V-class engine as the Digital Twin, so it is a genuine performance rather than the animated thumbnail older photo-to-video tools produced. Use it for one-off presenters, character work, and quick talking-head clips when you do not want to train a full avatar. Billed at 13 credits per second of output.

Text to Video: a finished cut from one prompt

Text to Video takes a single written brief and returns a complete, presenter-led cut. HeyGen scripts it, casts a presenter, and assembles the edit, so you go from an idea to a finished clip without sourcing footage, writing a script, or editing a timeline. Pick the orientation, the length, and optionally the presenter from a curated cast, then describe what you want. Ideal for explainers, product rundowns, and social videos where you need a polished result fast. Billed at 12 credits per second across 15, 30, and 60 second lengths.

Video Translate: one clip, 179 languages

Video Translate re-voices and re-lip-syncs any video into 179 languages while keeping the original speaker on screen. Upload a clip, choose a target language, and HeyGen produces a translated MP4 with matched voice and lip movement, plus an SRT subtitle file. Localise a tutorial, a testimonial, or a full content library into new markets without a re-record or a separate dubbing pass. You can also translate the audio only, leaving faces untouched, for voiceover and faceless footage. Billed at 9 credits per input second.

V3 Lipsync: replace the audio on any video

V3 Lipsync replaces the audio on existing footage. Upload any video plus the new audio you want it to speak, and HeyGen reads the speaker's face, infers phoneme positions, and reproduces frame-accurate lip movement. Precision runs the full V3 inference engine for broadcast-quality accuracy at 21 credits per second; Speed runs a faster pipeline at 11 credits per second with marginally lower fidelity, ideal for drafts and high-volume dubbing. Both take the same input, video plus audio, and return an MP4 with a companion SRT caption file. Use it for final dub passes, language swaps, voiceover changes, and single-line revisions on finished video where a reshoot would be too expensive.

Captions included on every lipsync and translate render

Every V3 Lipsync render and every Video Translate job returns a companion SRT subtitle file alongside the MP4. The file is mirrored into Stensyl storage permanently, so the URL never expires. Download it from the gallery lightbox with one click, or pull it programmatically via the gallery API, then drop it into any editor or social platform for instant caption tracks. Speed and Precision return identical caption quality.

Integrated across every Stensyl studio

My Avatar, built on Avatar V, shows up as a Cast member in Storyboards and Film Studio with no extra wiring. Drop the Avatar Video node onto Canvas to wire scripted clips into composable workflows, or ask Ray to render one and she emits a confirm card. Talking Photo and Text to Video sit beside Seedance and Kling in the Generate video picker, while Video Translate and V3 Lipsync live in Generate's Utilities tab, ready for any clip you bring.

Voice cloning baked in

When you train a Digital Twin, Stensyl also clones your voice through HeyGen's standalone voice service so renders speak in your real voice, not a default synthetic one. The same audio source feeds an ElevenLabs clone in parallel for non-avatar surfaces (Write Studio TTS, Marketing Studio carousels, Generate audio tab). One training session, two voice providers, one credit pool. Pair Lipsync or Video Translate with either clone to localise libraries without losing your sound.

Frequently asked

Questions about HeyGen.

Six tools, all sharing one credit pool and one studio: Avatar V Digital Twin (a trained avatar from your own footage), Talking Photo (make any photo talk), Text to Video (a finished clip from a prompt), Video Translate (re-voice and re-lip-sync into 179 languages), and V3 Lipsync in Speed and Precision (replace the audio on any video).
Built differently

Why Stensyl?.