InfiniteTalk ComfyUI Workflow (WAN 2.1): Img2Vid, Vid2Vid & Multi-Talk

Last updated: March 8, 2026 8:20 pm

By Esha Sharma

5 Min Read

InfiniteTalk is a talking-video system. You feed it a images or an existing video plus an audio track, and it makes a lip-synced clip. There’s a ComfyUI workflow with ready nodes/workflows so you can run it inside ComfyUI. It’s built around the Wan 2.1 i2v pipeline and uses an audio encoder (Wav2Vec2) to drive mouth/face motion.

What I’m doing

I run InfiniteTalk inside ComfyUI to get three results:

A still photo to a talking video
Swap new audio on an old video
Two people talking in the same scene

Files you need

Wan2_1-InfiniTetalk-Single_fp16.safetensors (one speaker)
Wan2_1-InfiniteTalk-Multi_fp16.safetensors (two or more)

Put the file you use in: ComfyUI/models/diffusion_models/.

Base model and fast preset

Base: WAN 2.1 I2V 480p (also works with WAN 2.1 Fusion X and WAN 2.1 720p)
Encoders: Use the same WAN 2.1 text encoder and the same WAN 2.1 VAE you already use
Speed: Lightning LoRA, steps = 4, CFG = 1
Samplers that stayed stable on my card: DPM++ SDE, LCM, FlowMatch. I stayed on DPM++ SDE.

All WAN 2.1 model files: (https://aistudynow.com/how-to-run-wan-2-1-fusionx-gguf-advanced-comfyui-workflow-on-low-vram/)

I made a quick video tutorial showing InfiniteTalk ComfyUI Workflow inside ComfyUI. You can watch it

Workflow

Load an image or load a video.
Load the MP3.
Open Resolution Master and press Auto so it copies the image size. If you want a standard size, pick a preset. No manual width and height.
In the audio group, set start and end with Audio Crop.
The small math node reads your FPS and the audio length and fills the frame count by itself.
Pick the right InfiniteTalk weight in its node (single or multi).
Render.

Example 1 — image to talking video (single)

Photo size: 1792 × 2368
Resolution Master set Auto to 720 × 960 (fine for WAN 2.1)
Audio: 42 s → Audio Crop from 0 to 42
InfiniteTalk file: infinite_talk_single.safetensors
Sampler: DPM++ SDE
Lightning: steps 4, CFG 1

What I saw: lips match words from start to end. Small blinks. Small head moves. On my GPU it took a bit over 20 min and used about 13–16 GB VRAM.

Example 2 — video to video (new audio, one speaker)

Source video: 1920 × 1080, 30 FPS, 998 frames
Target size: preset 832 × 480
New audio: 27 s, but I only need 12 s → Audio Crop 0 to 12
FPS: keep 30 FPS with get_fps ON. The math node fills 360 frames.
InfiniteTalk file: single
Lightning: 4 / 1
Sampler: DPM++ SDE
Prompt: “looking at the phone, natural review expression”

What I saw: when he says “pixel user,” the mouth shape lands on time. Pauses also look right. It reads like native speech, not a dub.

Example 3 — two people talking (multi)

InfiniteTalk file: infinite_talk_multi.safetensors
Base: WAN 2.1 I2V 480p (same encoder, same VAE)
One photo: a man and a woman in a car. I press Auto in Resolution Master so size is set.

Two audio tracks:

Man: 0 to 9 s
Woman: 0 to 12 s

Each voice gets Load Audio and Audio Crop. The math node sets the frame counts from your FPS and length.
Lightning: 4 / 1
Sampler: DPM++ SDE

What I saw: about 21 s total. When the man talks, the woman looks at him. When she talks, he turns. Lip-sync stays steady.

Small tips that helped

In WAN Video Long I2V, set Motion Frames to match your output FPS: use 25 for 30 FPS, 20 for 25 FPS, 16 for 16 FPS.
If color shifts between frames, keep ColorMatch OFF.
Two or more speakers: add one more Load Audio + Audio Crop pair per voice with clear start and end times.
Keep Lightning LoRA at steps 4, CFG 1 for fast tests.
Start with 480p (WAN 2.1 I2V 480p). Upscale later if your VRAM is small.

Multi Talk Wan2.1(aistudynow.com) (2)Download

Unlimited talk single ai studynow.com Download

Share This Article

Studied Computer Science. Passionate about AI, ComfyUI workflows, and hands-on learning through trial and error. Creator of AIStudyNow — sharing tested workflows, tutorials, and real-world experiments. Dev.to and GitHub.

15 Comments

Paul Phoenix says:

August 27, 2025 at 12:49 am

How do I bypass sageattention? Thanks in advance.

Reply
Wade says:

August 27, 2025 at 7:22 am

Can you post what setup you have for ComfyUI, Python version, pytorch version etc? I use Windows. I have not been able to get this to work with Python 3.12. It starts to run and then gets an error about missing Triton. When I try to install Triton, it always fails saying “no compatible version found”

Reply
- Esha Sharma says:
  
  August 27, 2025 at 10:13 pm
  
  download the latest comfyui Portable Version https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia.7z
  Navigate to the python_embeded folder
  Open that folder in cmd
  ./python.exe -m pip install triton
  
  Reply
hari says:

August 27, 2025 at 12:15 pm

please ma’am help me how to fix “I’m use infinitetalk_multi WanVideoSampler ‘NoneType’ object has no attribute ‘max’ “How to fix””

Reply
hari says:

August 27, 2025 at 12:27 pm

Example 3 — two people talking (multi) please ma’am help me how to fix “I’m use infinitetalk_multi WanVideoSampler ‘NoneType’ object has no attribute ‘max’ “How to fix””

Reply
- Esha Sharma says:
  
  August 27, 2025 at 10:09 pm
  
  can you send me image my email 23scienceinsights@gmail.com id so i can check
  
  Reply
  - hari says:
    
    August 29, 2025 at 2:43 am
    
    Please check Emal ma’am I’m sent
    
    Reply
CH Nisar says:

August 28, 2025 at 2:13 pm

Hello Dear Esha,

First of all, I would like to sincerely thank you for sharing your workflow and valuable guidance. I am currently using your workflow (Unlimited Talk – Single AI, studynow.com) and have also downloaded all the recommended models to ensure proper setup.

Here is my current process:

I uploaded my character image.

I added an audio file (9 seconds in length).

I did not change or modify any settings.

I simply pressed the RUN button to generate the output.

The process completed successfully, however, the output video shows correct and clean results only for the first two seconds. After that, the video becomes heavily distorted with high noise. The audio, on the other hand, plays perfectly throughout.
I have attached a screenshot for reference: (https://i.postimg.cc/VNRPNhYs/Screenshot-2025-08-28-170224.png)
.

My system specifications:

Operating System: Windows 11 Pro

GPU: NVIDIA RTX 5070

RAM: 64 GB DDR5

ComfyUI Installation: Manually installed Step by Step (not using the portable version)

Could you please guide me on how I can resolve this issue? I would greatly appreciate any troubleshooting steps or adjustments you might recommend.

Thank you very much for your time and support.

Best regards,
Ch Nisar

Reply
rez says:

August 31, 2025 at 3:04 pm

hi, i get error in comfy ui about “ResolutionMaster” node. i haven’t it. can you please give the link of download this node? i tried to download from comfy ui but after i restart comfy ui, still the problem is.

Reply
Avi says:

September 6, 2025 at 7:15 am

Curious if you tried without the lightx2v lora? Was the video quality better or same or worse?

Reply
Avi says:

September 6, 2025 at 11:36 am

Curious if you tried generating without the LightX lora? What was the quality like without the lora?

Reply
- Esha Sharma says:
  
  September 7, 2025 at 4:34 pm
  
  Same, i had checked it
  
  Reply
  - Avi says:
    
    September 8, 2025 at 4:09 am
    
    Got it. Also did you get a chance to compare between LightX2v and FusionX? Any idea which one turned out better?
    
    Reply
Sergey Ariev says:

September 12, 2025 at 4:03 pm

Hi! Does this model only fit square 5 characters well? I tried 16*9 and they don’t come out very well, unlike the square or vertical format.

Reply
tewfik says:

September 21, 2025 at 2:50 pm

Hey, many thanks for sharing the workflows and detailed instructions! (and thanks for the other resources you’re posting on youtube and on this website, very helpful learning how it works)
I was able to run the workflow, however the video looks slo-mo and the audio isn’t synced (although lipsync was generated). Tried to use 48khz and 41khz. Any ideas why this is happening ?
Thanks :)

Reply

Leave a Reply Cancel reply