| I want to... | Jump to |
|---|---|
| Turn a photo into a talking video | The Core Pipeline β Motion Control β Lip Sync |
| Make a music video | Multi-Angle Technique |
| Clone my voice for narration | Voice Cloning |
| Fix a failed generation | Repair & Salvage |
| Understand what tools to use | Tool Selector |
The 80/20 Rule: Your source image determines 80% of your final quality. Everything downstream is damage control.
| Tool | Best For | Link |
|---|---|---|
| Flux 2 Pro | Photorealism, fine detail | flux.ai |
| Midjourney v7 | Artistic interpretation, cultural references | midjourney.com |
| Ideogram 3.0 | Text rendering in images | ideogram.ai |
| Leonardo AI | Variations, consistency | leonardo.ai |
πΊ Tutorial: Midjourney V7 Complete Guide β Official documentation for V7 features including Draft Mode and Omni Reference
Example:
Woman in red dress answering phone, annoyed expression,
shallow focus, practical lighting, 35mm film grain, hotel lobby
β οΈ Common Mistake: Long environment descriptions. Keep it minimal β complex backgrounds create animation problems later.
If a face looks "almost right" but something's off, fix it NOW. Motion amplifies every flaw.
NanoBanana (nanobanana.com) corrects:
- Facial proportions without changing identity
- Eye alignment and asymmetry
- Mouth/jawline for animation readiness
Image Generation β NanoBanana (1 pass only) β Animation
π‘ Pro Tip: One pass only. Multiple passes flatten expression.
| Your Shot Needs... | Use This |
|---|---|
| Precise body movement from reference video | Kling Motion Control |
| Dialogue with built-in audio | Google Veo 3.1 |
| Physical weight and grounded movement | Hailuo Minimax |
| Fast turnaround, good physics | Luma Dream Machine |
| Style transformation of real footage | Runway Gen-4 |
| Long-form narrative coherence | OpenAI Sora 2 |
| Platform | Link |
|---|---|
| Kling AI | klingai.com |
| Google Veo | deepmind.google/veo |
| Hailuo/Minimax | hailuoai.video |
| Luma Dream Machine | lumalabs.ai |
| Runway | runwayml.com |
| OpenAI Sora | openai.com/sora |
| Pika Labs | pika.art |
What it does: Transfer your recorded performance onto any character. Your body drives their body.
πΊ Official Guide: Kling Motion Control User Guide
πΊ Deep Dive: Higgsfield Motion Control Guide
- Record yourself (3-30 sec, stationary camera, single person)
- Upload reference video (your performance)
- Upload character image (who you want to become)
- Choose mode:
- Exact = Static camera, precise match
- Partial = Camera can move independently
- Generate
| Do | Don't |
|---|---|
| β Match framing (waist-up reference β waist-up output) | β Full-body reference for close-up output |
| β Empty hands in character image | β Hold props (they disappear) |
| β Neutral mouth in character image | β Open mouth or teeth showing |
| β Single subject only | β Multiple people |
| β Stationary camera | β Pans, zooms, handheld shake |
Turn one performance into a multi-camera edit:
1. Record ONE continuous performance
β
2. Generate 3-5 character images (different angles/backgrounds)
β
3. Run SAME reference video against EACH image
β
4. All outputs sync perfectly (they share timing)
β
5. Edit together with beat-matched cuts
π‘ Pro Tip: Segment into 10-second chunks. Generate all angles before editing.
| Tool | Best For | Link |
|---|---|---|
| Kling Lip Sync | Integrated with motion, handles singing | Built into Kling AI |
| Magic Hour | Highest realism, extreme poses | magichour.ai |
| HeyGen | Avatars, multilingual | heygen.com |
| Sync.so | Style learning, dialogue editing | sync.so |
Generate video (face visible, 5-10 sec)
β
Isolate vocals from audio (Lalal.ai or Moises.ai)
β
Apply lip sync tool
β
Recombine with instrumental in editor
β οΈ Never feed a full music track to lip sync. Isolate vocals first. Ultimate Vocal Removal is incredible and free.
- Generate base video with visible face
- Enable "Match Mouth" tracking (~10 min processing)
- Upload clean isolated vocal audio
- Adjust frame offset in your editor if needed
| Platform | Best For | Free Tier | Pro Price | Link |
|---|---|---|---|---|
| ElevenLabs | Emotional range, English | 10K chars/mo | $22/mo | elevenlabs.io |
| Fish Audio | Emotion control, multilingual | Limited | $5-330 | fish.audio |
| Play.ht | 100+ languages | Limited | $14-198 | play.ht |
| Resemble AI | API access, enterprise | Pay-as-you-go | $29-99 | resemble.ai |
| Respeecher | Film industry standard | None | ~$167/mo | respeecher.com |
πΊ Tutorial: ElevenLabs Voice Cloning Guide β Official documentation for instant and professional voice cloning
- Record 1-3 minutes of clean audio (no background noise)
- Upload to Voices β Create Voice β Instant Clone
- For pro quality: 30+ minutes audio, use Professional Clone (Creator plan required)
β οΈ Legal: Get written consent for any voice you clone commercially.
| Platform | Best For | Link |
|---|---|---|
| Suno v4.5 | Complete songs with vocals, easiest | suno.com |
| Udio | Stem control, pro mixing | udio.com |
πΊ Tutorial: Suno Complete Guide β Official guide to creating AI music
Describe song style in Suno
β
Generate with isolated stems enabled
β
Feed vocal stem to lip sync
β
Recombine in Video Editor
| Setting | Value | Why |
|---|---|---|
| Model | Proteus | Best for AI-generated content |
| Output | 4K (3840Γ2160) | Distribution standard |
| Recover Detail | 0 |
πΊ Tutorial: Topaz Video AI Documentation
Link: topazlabs.com/topaz-video
Different AI clips have different textures. Grain unifies everything.
In DaVinci Resolve:
1. Place grain asset above all footage
2. Blend mode: Overlay
3. Opacity: ~30%
This single step often does more than hours of per-clip color correction.
Lock framing β Test at low tier β Generate at full quality β Upscale ONCE β Lip sync LAST
Reordering these steps wastes money on content that gets regenerated.
- β Generate at native resolution, upscale once at the end
- β Test complex shots with Standard mode before Professional
- β Batch similar shots before committing
- β Don't use platform "enhancers" (Topaz is better and cheaper per clip)
- β Don't lip sync before final framing is locked
A "failed" generation is often fixable. Repair costs time; regeneration costs credits.
| Tool | Fixes | Link |
|---|---|---|
| FlowFrames | Optical flow smoothing | github.com/n00mkrad/flowframes |
| Topaz Chronos | Frame pacing | Included in Topaz Video AI |
| FaceFusion | Temporal face stabilization | github.com/facefusion/facefusion |
| EbSynth | Style locking across frames | ebsynth.com |
| Symptom | Action |
|---|---|
| Isolated jitter, content is good | Repair |
| Uneven frame pacing | Repair |
| Wrong physics, identity drift | Regenerate |
| Multiple compounding issues | Regenerate |
| What You See | What Caused It | Fix |
|---|---|---|
| Teeth morph mid-sentence | Aggressive lip sync | Reduce lip sync strength |
| Floating hands | Reference video framing mismatch | Re-crop reference to match output |
| Eye jitter | Face too small in frame | Generate with larger face |
| Texture crawl | Sharpening or HDR/SDR mixing | Disable sharpening, unify color space |
| Identity drift | Inconsistent reference images | Use Omni Reference for consistency |
| Background loops | Clip too long | Keep under 10 seconds |
Run before final export. If anything fails, regenerate β don't patch.
- Eyes track consistently, no micro-jumps
- Teeth stable across frames
- Hands don't partially disappear
- Clothing doesn't shimmer or crawl
- Background motion doesn't loop
- Hard consonants (p, b, t, d) match lips
- Breathing matches chest movement
- Room tone matches environment size
- Every cut has purpose
- Camera movement has intent
- Emotional state clear within 2 seconds
Teams that storyboard before generation report 30-50% fewer regenerations.
| Tool | Capability | Link |
|---|---|---|
| Boords | Text-to-storyboard, shot continuity | boords.com |
| Shotry AI | AI storyboards with camera metadata | shotry.ai |
| Kive.ai | Visual reference boards | kive.ai |
- Define camera angle, lens, movement intent
- Create reference boards for color/lighting
- Map shot sequence with emotional purpose
- Test with still images before video
Project/
ββ 01_Source_Images/
ββ 02_Reference_Video/
ββ 03_Generations/
β ββ v1_exploration/
β ββ v2_selected/
β ββ v3_final/
ββ 04_Audio/
ββ 05_Upscaled/
ββ 06_Edit/
ββ prompts.txt
β οΈ Never overwrite generations. Version drift is how quality regressions sneak in.
AI output is mathematically perfect. Real footage isn't. Add controlled imperfection.
| Tool | What It Does | Link |
|---|---|---|
| Dehancer Pro | Film response curves | dehancer.com |
| FilmBox | Color science emulation | videovillage.co/filmbox |
| CineMatch | Camera-to-film matching | filmconvert.com/cinematch |
Apply after generation, before final grade. Adds halation, grain, highlight rolloff.
- Assume Rec.709 gamma 2.4 unless platform specifies otherwise
- Convert all clips to single working space before editing
- Never mix HDR and SDR without tone mapping
- Generate everything at 24fps or 30fps, never mixed
- Fix frame rate BEFORE lip sync, never after
- If jittery, apply optical flow AFTER upscaling
- Fine skin texture
- Subtle gradients
- Neon lighting
- Fog, smoke, rain
- Add light grain before export (gives encoders texture to preserve)
- Boost contrast slightly
- Avoid pure black backgrounds (macroblock badly)
- Export at higher bitrate than platform recommends
| Tool | Link |
|---|---|
| Flux 2 Pro | flux.ai |
| Midjourney v7 | midjourney.com |
| Ideogram 3.0 | ideogram.ai |
| Leonardo AI | leonardo.ai |
| Tool | Link |
|---|---|
| NanoBanana | nanobanana.com |
| Enhancer.ai | enhancer.ai |
| Topaz Photo AI | topazlabs.com/topaz-photo-ai |
| Topaz Gigapixel | topazlabs.com/gigapixel |
| Tool | Link |
|---|---|
| Kling AI | klingai.com |
| Google Veo | deepmind.google/veo |
| Hailuo Minimax | hailuoai.video |
| Luma Dream Machine | lumalabs.ai/dream-machine |
| Runway Gen-4 | runwayml.com |
| OpenAI Sora | openai.com/sora |
| Pika Labs | pika.art |
| Morph Studio | morphstudio.com |
| Kaiber | kaiber.ai |
| Tool | Link |
|---|---|
| Magic Hour | magichour.ai |
| HeyGen | heygen.com |
| Sync.so | sync.so |
| LipDub AI | lipdub.ai |
| Tool | Link |
|---|---|
| ElevenLabs | elevenlabs.io |
| Fish Audio | fish.audio |
| Play.ht | play.ht |
| Lalal.ai (stem separation) | lalal.ai |
| Moises.ai (stem separation) | moises.ai |
| Tool | Link |
|---|---|
| Suno | suno.com |
| Udio | udio.com |
| Tool | Link |
|---|---|
| Topaz Video AI | topazlabs.com/topaz-video |
| FlowFrames | github.com/n00mkrad/flowframes |
| FaceFusion | github.com/facefusion/facefusion |
| EbSynth | ebsynth.com |
| Tool | Link |
|---|---|
| Higgsfield | higgsfield.ai |
| Freepik AI | freepik.com/ai |
Tools change monthly. These don't:
-
Capture quality determines your ceiling. No tool compensates for bad inputs.
-
Lock framing early. Mid-process reframing cascades problems everywhere.
-
Modular separation. Treat body motion, face animation, and voice as independent tracks. Combine in editorial.
-
Regeneration beats repair. Fresh output usually costs less than fixing broken output.
-
Ambiguity multiplies cost. Know exactly what you want before generating.
Last verified: January 2026 Platform capabilities shift rapidly β confirm current features before production.