WAN 2.2 Animate is a character-focused video model. It takes motion and expression from a driver video and applies it to your subject while keeping identity steady, so the face feels stable and the head and body follow the reference. Under the hood, Wan 2.2 uses a Mixture-of-Experts setup that improves quality across denoising steps, which is why the motion transfer looks clean when you set it up right.
There are ready builds for Animate on Hugging Face, including quantized GGUF variants that load well in ComfyUI with a GGUF loader. Repack notes also list where to place the text encoder and VAE for a working stack.
I kept the Workflow simple. One graph. You load your video. You load your subject image. You add the mask with cricle green points. You pick models. You run. The swap looks natural. Faces stay same. Motion reads right.
Files you need
WAN 2.2 Animate models
- Wan2.2-Animate-14B FP8 model: Wan2_2-Animate-14B_fp8_e4m3fn_scaled_KJ.safetensors
Folder:ComfyUI/models/diffusion_models
- Optional for low VRAM: Wan2.2 I2V GGUF Q4 or Q5 pairs here is the link
Folders often used:ComfyUI/models/unet
orComfyUI/models/diffusion_models
InfinityTalk for lip sync
- Wan 2.1 InfiniteTalk Wan2_1-InfiniteTalk-Single_fp8_e4m3fn_scaled_KJ.safetensors
- GGUF if you run a GGUF Animate build
Folder:ComfyUI/models/diffusion_models
Wan2_1-InfiniteTalk_Multi_Q8.gguf
Uni3C for camera motion
- Wan21_Uni3C_controlnet_fp16.safetensors
Folder:ComfyUI/models/controlnet
LoRAs used in this flow
- Relighting LoRA for Animate to match scene color and tone WanAnimate_relight_lora_fp16.safetensors
- LightX 4-step LoRA set for Animate or I2V
low_noise_model.safetensors - Pusa v1 motion LoRA if you want stronger motion : Wan21_PusaV1_LoRA_14B_rank512_bf16.safetensors
- HPS LoRA for human-preference quality tuning : Wan2.2-Fun-A14B-InP-LOW-HPS2.1_resized_dynamic_avg_rank_15_bf16.safetensors
The workflow layout
Background masking
- Load your video in the Video section.
- In Resolution Master, pick a supported size for the model and your GPU. It helps you stay inside safe memory.
- Open the Point Editor. Use green points for what you want in the mask and red points for what you want to exclude. Keep only the object you plan to replace.
- Prefer points for speed. If you like layer masks, use SegmentAnything Ultra v3 style nodes in Segments v3. For this tutorial we stay with SEM to segment nodes.
Face motion from a driver video
- If you do not like the expression in your first video, load another short face video as the driver.
- Plug the Image hook from that driver clip into the Face Image section group.
- If you want to stick to the first video’s face motion, connect the first video back to that hook.
Lip sync
- Enable InfinityTalk with WAN 2.2 Animate when you want stronger lip sync. It is an audio driven model that drives mouth and subtle head motion.
- If you run Animate in GGUF, pair it with the Wan 2.1 InfiniteTalk GGUF build to match the loader.
Camera motion
- Turn on WanVideo Uni3C when you want camera pan or subtle angle. It transfers camera movement from a reference clip. In ComfyUI you load the Uni3C ControlNet file and connect the control video.
- It is still evolving. Results vary across builds.
Model section
- Pick WAN 2.2 Animate FP8 for quality. If your GPU is tight on VRAM, try Q4 GGUF.
- Load LoRAs
- Relighting to fix color tone and lighting match
- LightX 4-step for speed and motion stability
- Pusa v1 if you need more motion
- HPS if you want human-preference tilt toward realism
- Text encoder and VAE
- Use the WAN text encoder and VAE that your Animate build expects. See the model page for exact pairs
Guitar test with Image swap
Reference image: Gollum cut out with Remove Background.

Video: a woman playing guitar.
Masking: green points on the woman shape, red points on the guitar and background edges.
Run.
The DW pose node reads face and body points and passes motion through, so hands keep strum beats while you swap the subject.
Q4 GGUF vs FP8
First try on Q4 took about 28 seconds to show a first result. I had the frame cap stuck at 37 by mistake. I fixed it to 153 frames and it split across two runs. Total was around one minute on my side.
Quality matched the FP8 sample. Motion looked better once the frame cap was corrected.
If your card is small, Q4 is a real option. The QuantStack GGUF repo lists Q2 to Q8 variants so you can move up or down by VRAM.
change the face motion with a driver clip
I loaded a new face driver video where the head bobs up and down.
Swapped the hook to this driver.
On the first run the motion only came partway.
I ran again with a tiny prompt nudge and it followed the driver 100 percent.
This is the quick trick. When expression is wrong, load another driver and reconnect the hook.
Lip sync with InfinityTalk
Turn on the InfinityTalk group.
Video: a woman singing. Reference image: a woman portrait.

First I only check the mask. Then I enable models and pick the InfinityTalk build.
If Animate is GGUF, select the InfinityTalk GGUF too.
Run. Lip sync locks in. You can zoom close and it still looks blended. Necklace, clothes and face match the reference look.
Test Uni3C for camera motion
Enable the Uni3C group.
Load a short clip that has camera angle changes.
I used 720 by 720.
On my side the result was mostly static with a small drift. This is expected sometimes. Try different driver clips. The Uni3C paper and loaders are new and moving fast.
FAQ
Can I use this on a very small GPU
Yes. Try the Q4 GGUF builds and smaller resolutions.