How to Use Wan 2.2 Fun ControlNet in ComfyUI Workflow

Alibaba just dropped two new models — Wan2.2-Fun-A14B-InP and Wan2.2-Fun-A14B-Control — and I’ve been testing the Control version. It’s a 14B parameter model built for one thing: letting you take control of video generation.

You can feed it:

Canny edges
Depth maps
Pose data
Trajectory inputs

The surprising part? It actually listens. Plenty of video models start ignoring your control halfway through, but this one keeps following it frame by frame.

It’s trained at 512, 768, and 1024 resolutions, supports up to 81 frames at 16 FPS, and works with multi-language prompts. I decided to build a ComfyUI workflow using the WanFunControl to Video node. Everything connected fine… but right now, it’s not fully supported in that form.

So, I went with a different setup. It’s heavy on VRAM — over 12 GB — but it runs. I’ll share an optimized, low VRAM version later, but for now this is the working method.

I made a quick video tutorial showing Wan 2.2 Fun ControlNet workflow inside ComfyUI. You can watch it

Getting Wan 2.2 Fun ControlNet Running in ComfyUI

If you want to run Wan 2.2 Fun ControlNet locally inside ComfyUI, the setup’s simple.

Install the Required Node

First, you’ll need a custom node called VideoX-Fun:

Open the Manager in ComfyUI.
Go to Custom Node Manager.
Search for VideoX-Fun.
Click Install.

Once that’s done, you can open your workflow.

Choosing the Right Model

In the Load Wan 2.2 Fun Model node, you’ll see two options:

Wan 2.2 Fun A14B InP
Wan 2.2 Fun A14B Control

For control tasks, select Wan 2.2 Fun A14B Control. If you try to run it without the files, you’ll get a “Please download the Fun model” message — because ComfyUI won’t pull them automatically.

Downloading the Model Files

Here’s the correct way to set them up:

Inside ComfyUI/models/, create a new folder called Fun_Models (exact spelling).
Go to the Hugging Face Wan2.2-Fun-A14B-Control page.
Open the Files and Versions tab.
Download all files from the Wan2.2-Fun-A14B-Control folder.
Place them into:

ComfyUI/models/Fun_Models/Wan2.2-Fun-A14B-Control

(Folder name must match exactly.)

Tip: You can skip manual downloads by cloning the repository. On Hugging Face, click Clone repository, copy the Git command, open Fun_Models in CMD, paste it, and it will pull everything for you.

There are a few GPU memory options — and this matters if you’re not running a monster GPU. For example:

model_full_load puts everything on the GPU (fastest, but heavy).
model_cpu_offload moves it back to the CPU after use to save VRAM.
model_cpu_offload_and_qfloat8 does the same but quantizes the transformer to float8 for even more VRAM savings.
sequential_cpu_offload loads each layer one at a time (slow, but works on smaller cards).

Under Precision, you’ll see FP16 and BF16.
FP16 is the safer choice — it’s supported on more hardware and runs a bit faster. BF16 can help in some setups, but only if your GPU supports it natively.

So yeah… if you’re hitting out-of-memory errors, try sequential mode first. And unless you know you need BF16, just stick with FP16.

Reference Images and Control Videos

Once the model is loaded, the first thing you’ll see is the Image Section.
This is where you can upload a reference image — the look, clothing, or background you want your final video to match. If you don’t need it, you can just disable it using the Fast Group Bypasser.

Then there’s the Control Video Group. This is where the real magic happens. You can feed it:

Canny edge sequences
Depth maps
Pose data

You can even go for trajectory control, which lets you guide camera movement or object paths inside the video.

For the length, I keep it at 81 frames and set FPS to 16. If you’re tracking motion from a source video, DWPose Estimator works great — it finds body and keypoint positions so the movement stays locked in.

You can also combine control nets. For example:

Realistic Lineart AIO Aux Preprocessor for edges
Depth Anything v2 for depth info

If you only need one control net, bypass the blend node and connect your image directly to Resize Image v2. With two control nets, connect one to Image 1 and the other to Image 2.

One thing to note — Depth Anything makes a depth map for every frame, and if your reference image background doesn’t match, you’ll end up with a mix of both. That’s why I stick with DWPose when I want the background to stay exactly like the reference.

Sampler Settings, TeaCache, and Real Tests

For the Wan 2.2 Fun ControlNet sampler I keep CFG at 5 and Steps at 20. That hits a decent balance between speed and detail.

There’s a toggle called TeaCache. I tested it on and off — honestly, the results looked the same. No quality drop, no time difference. So I just leave it enabled.

Dance test (reference image + video)

I used a reference image of a woman in a red-and-green checkered coat (modern hallway, soft light). Then I uploaded a dancing video and ran DWPose Estimator for motion tracking.

For the prompt I told the AI:

A young woman with long wavy blonde hair is wearing a red and green checkered coat. She dances happily while walking down a modern indoor hallway with big glass walls and soft light. Her coat moves and swings as she dances. In the background, there are warm glowing lights and reflections, giving the scene a beautiful, movie-like look.

Result: it matched the reference and the motion.

The coat matched perfectly.
Background matched the image — big glass walls, warm lights, reflections.
The coat moves and swings naturally with the dance.
Motion stays consistent with the video reference.
No strange clothing artifacts.

I even tried it with TeaCache disabled — same quality, same timing.

Video-to-Video example (your snowy scene)

So I uploaded a new video — a young woman walking outdoors in the snow.

She’s wearing a black high-neck top with brown velvet sleeves, matching high-waisted black leather pants, hands in her pockets. Behind her is a line of leafless trees and a snow-covered pathway.

For the prompt I told the AI:

A young woman with long black hair, wearing a light green sweater and a white shirt under it, with a pink shirt tied with a bow, walking outdoors in the snow. Her hands are in her pockets, and behind her, trees line and a snow-covered pathway.

And the result?
It followed the prompt really well.

Light green sweater exactly as described.
White and pink skirt details match perfectly.
Leafless trees in the background.
Snow-covered pathway looks natural.
Motion stays consistent with the source video.

Bottom line: with Wan 2.2 Fun ControlNet, the control video anchors the motion. You can change the look and clothing with a prompt and reference, but the movement remains true to the source.

Deprecated Workflow. See the latest Video with Workflow here: WAN 2.2 Fun Control In ComfyUI On Low VRAM

Wan 2.2 Controlnet Workflow Free Download

Resource ready for free download! Sign up with your email to get instant access.

How to Use Wan 2.2 Fun ControlNet in ComfyUI Workflow