InfiniteTalk is a talking-video system. You feed it a images or an existing video plus an audio track, and it makes a lip-synced clip. There’s a ComfyUI workflow with ready nodes/workflows so you can run it inside ComfyUI. It’s built around the Wan 2.1 i2v pipeline and uses an audio encoder (Wav2Vec2) to drive mouth/face motion.
What I’m doing
I run InfiniteTalk inside ComfyUI to get three results:
- A still photo to a talking video
- Swap new audio on an old video
- Two people talking in the same scene
Files you need
Wan2_1-InfiniTetalk-Single_fp16.safetensors
(one speaker)Wan2_1-InfiniteTalk-Multi_fp16.safetensors
(two or more)
Put the file you use in: ComfyUI/models/diffusion_models/
.
Base model and fast preset
- Base: WAN 2.1 I2V 480p (also works with WAN 2.1 Fusion X and WAN 2.1 720p)
- Encoders: Use the same WAN 2.1 text encoder and the same WAN 2.1 VAE you already use
- Speed: Lightning LoRA, steps = 4, CFG = 1
- Samplers that stayed stable on my card: DPM++ SDE, LCM, FlowMatch. I stayed on DPM++ SDE.
All WAN 2.1 model files: (https://aistudynow.com/how-to-run-wan-2-1-fusionx-gguf-advanced-comfyui-workflow-on-low-vram/)
I made a quick video tutorial showing InfiniteTalk ComfyUI Workflow inside ComfyUI. You can watch it
Workflow
- Load an image or load a video.
- Load the MP3.
- Open Resolution Master and press Auto so it copies the image size. If you want a standard size, pick a preset. No manual width and height.
- In the audio group, set start and end with Audio Crop.
- The small math node reads your FPS and the audio length and fills the frame count by itself.
- Pick the right InfiniteTalk weight in its node (single or multi).
- Render.
Example 1 — image to talking video (single)

- Photo size: 1792 × 2368
- Resolution Master set Auto to 720 × 960 (fine for WAN 2.1)
- Audio: 42 s → Audio Crop from 0 to 42
- InfiniteTalk file:
infinite_talk_single.safetensors
- Sampler: DPM++ SDE
- Lightning: steps 4, CFG 1
What I saw: lips match words from start to end. Small blinks. Small head moves. On my GPU it took a bit over 20 min and used about 13–16 GB VRAM.
Example 2 — video to video (new audio, one speaker)
- Source video: 1920 × 1080, 30 FPS, 998 frames
- Target size: preset 832 × 480
- New audio: 27 s, but I only need 12 s → Audio Crop 0 to 12
- FPS: keep 30 FPS with get_fps ON. The math node fills 360 frames.
- InfiniteTalk file: single
- Lightning: 4 / 1
- Sampler: DPM++ SDE
- Prompt: “looking at the phone, natural review expression”
What I saw: when he says “pixel user,” the mouth shape lands on time. Pauses also look right. It reads like native speech, not a dub.
Example 3 — two people talking (multi)

- InfiniteTalk file:
infinite_talk_multi.safetensors
- Base: WAN 2.1 I2V 480p (same encoder, same VAE)
- One photo: a man and a woman in a car. I press Auto in Resolution Master so size is set.
Two audio tracks:
- Man: 0 to 9 s
- Woman: 0 to 12 s
Each voice gets Load Audio and Audio Crop. The math node sets the frame counts from your FPS and length.
Lightning: 4 / 1
Sampler: DPM++ SDE
What I saw: about 21 s total. When the man talks, the woman looks at him. When she talks, he turns. Lip-sync stays steady.
Small tips that helped
- In WAN Video Long I2V, set Motion Frames to match your output FPS: use 25 for 30 FPS, 20 for 25 FPS, 16 for 16 FPS.
- If color shifts between frames, keep ColorMatch OFF.
- Two or more speakers: add one more Load Audio + Audio Crop pair per voice with clear start and end times.
- Keep Lightning LoRA at steps 4, CFG 1 for fast tests.
- Start with 480p (WAN 2.1 I2V 480p). Upscale later if your VRAM is small.
Can you post what setup you have for ComfyUI, Python version, pytorch version etc? I use Windows. I have not been able to get this to work with Python 3.12. It starts to run and then gets an error about missing Triton. When I try to install Triton, it always fails saying “no compatible version found”
download the latest comfyui Portable Version https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia.7z
Navigate to the python_embeded folder
Open that folder in cmd
./python.exe -m pip install triton
Example 3 — two people talking (multi) please ma’am help me how to fix “I’m use infinitetalk_multi WanVideoSampler ‘NoneType’ object has no attribute ‘max’ “How to fix””
can you send me image my email 23scienceinsights@gmail.com id so i can check
Please check Emal ma’am I’m sent