InfiniteTalk ComfyUI Workflow (WAN 2.1): Img2Vid, Vid2Vid & Multi-Talk

InfiniteTalk is a talking-video system. You feed it a images or an existing video plus an audio track, and it makes a lip-synced clip. There’s a ComfyUI workflow with ready nodes/workflows so you can run it inside ComfyUI. It’s built around the Wan 2.1 i2v pipeline and uses an audio encoder (Wav2Vec2) to drive mouth/face motion.

Contents

Files you need
Base model and fast preset
Workflow
Example 1 — image to talking video (single)
Example 2 — video to video (new audio, one speaker)
Example 3 — two people talking (multi)
Small tips that helped

What I’m doing

I run InfiniteTalk inside ComfyUI to get three results:

A still photo to a talking video
Swap new audio on an old video
Two people talking in the same scene

Files you need

Wan2_1-InfiniTetalk-Single_fp16.safetensors (one speaker)
Wan2_1-InfiniteTalk-Multi_fp16.safetensors (two or more)

Put the file you use in: ComfyUI/models/diffusion_models/.

Base model and fast preset

Base: WAN 2.1 I2V 480p (also works with WAN 2.1 Fusion X and WAN 2.1 720p)
Encoders: Use the same WAN 2.1 text encoder and the same WAN 2.1 VAE you already use
Speed: Lightning LoRA, steps = 4, CFG = 1
Samplers that stayed stable on my card: DPM++ SDE, LCM, FlowMatch. I stayed on DPM++ SDE.

All WAN 2.1 model files: (https://aistudynow.com/how-to-run-wan-2-1-fusionx-gguf-advanced-comfyui-workflow-on-low-vram/)

I made a quick video tutorial showing InfiniteTalk ComfyUI Workflow inside ComfyUI. You can watch it

Workflow

Load an image or load a video.
Load the MP3.
Open Resolution Master and press Auto so it copies the image size. If you want a standard size, pick a preset. No manual width and height.
In the audio group, set start and end with Audio Crop.
The small math node reads your FPS and the audio length and fills the frame count by itself.
Pick the right InfiniteTalk weight in its node (single or multi).
Render.

Example 1 — image to talking video (single)

Photo size: 1792 × 2368
Resolution Master set Auto to 720 × 960 (fine for WAN 2.1)
Audio: 42 s → Audio Crop from 0 to 42
InfiniteTalk file: infinite_talk_single.safetensors
Sampler: DPM++ SDE
Lightning: steps 4, CFG 1

What I saw: lips match words from start to end. Small blinks. Small head moves. On my GPU it took a bit over 20 min and used about 13–16 GB VRAM.

Example 2 — video to video (new audio, one speaker)

Source video: 1920 × 1080, 30 FPS, 998 frames
Target size: preset 832 × 480
New audio: 27 s, but I only need 12 s → Audio Crop 0 to 12
FPS: keep 30 FPS with get_fps ON. The math node fills 360 frames.
InfiniteTalk file: single
Lightning: 4 / 1
Sampler: DPM++ SDE
Prompt: “looking at the phone, natural review expression”

What I saw: when he says “pixel user,” the mouth shape lands on time. Pauses also look right. It reads like native speech, not a dub.

Example 3 — two people talking (multi)

InfiniteTalk file: infinite_talk_multi.safetensors
Base: WAN 2.1 I2V 480p (same encoder, same VAE)
One photo: a man and a woman in a car. I press Auto in Resolution Master so size is set.

Two audio tracks:

Man: 0 to 9 s
Woman: 0 to 12 s

Each voice gets Load Audio and Audio Crop. The math node sets the frame counts from your FPS and length.
Lightning: 4 / 1
Sampler: DPM++ SDE

What I saw: about 21 s total. When the man talks, the woman looks at him. When she talks, he turns. Lip-sync stays steady.

Small tips that helped

In WAN Video Long I2V, set Motion Frames to match your output FPS: use 25 for 30 FPS, 20 for 25 FPS, 16 for 16 FPS.
If color shifts between frames, keep ColorMatch OFF.
Two or more speakers: add one more Load Audio + Audio Crop pair per voice with clear start and end times.
Keep Lightning LoRA at steps 4, CFG 1 for fast tests.
Start with 480p (WAN 2.1 I2V 480p). Upscale later if your VRAM is small.

Multi Talk Wan2.1(aistudynow.com) (2)Download

Unlimited talk single ai studynow.com Download

Wade says:

August 27, 2025 at 7:22 am

Can you post what setup you have for ComfyUI, Python version, pytorch version etc? I use Windows. I have not been able to get this to work with Python 3.12. It starts to run and then gets an error about missing Triton. When I try to install Triton, it always fails saying “no compatible version found”

- Esha says:
  
  August 27, 2025 at 10:13 pm
  
  download the latest comfyui Portable Version https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia.7z
  Navigate to the python_embeded folder
  Open that folder in cmd
  ./python.exe -m pip install triton
  
hari says:

August 27, 2025 at 12:27 pm

Example 3 — two people talking (multi) please ma’am help me how to fix “I’m use infinitetalk_multi WanVideoSampler ‘NoneType’ object has no attribute ‘max’ “How to fix””

- Esha says:
  
  August 27, 2025 at 10:09 pm
  
  can you send me image my email 23scienceinsights@gmail.com id so i can check
  
  - hari says:
    
    August 29, 2025 at 2:43 am
    
    Please check Emal ma’am I’m sent

Introducing AI for customer service

Top Stories

Wan 2.2 S2V-14B: Speech-to-Video ComfyUI Workflow (GGUF Ready)

Wan2.2 S2V-14B Released: Open Speech-to-Video Model You Can Try