How to Write Better WAN 2.2 Prompts — Tested in ComfyUI

Esha
By Esha
23 Min Read

Let’s start here: WAN 2.2 isn’t just a small upgrade from 2.1—it’s a huge improvement.

If you’ve played with WAN 2.1 before, you probably remember its strengths (dense diffusion, decent motion) and its limits (camera movement was hit-or-miss, and details often melted if you pushed it too hard). 2.2 changes that in a big way — especially if you’re running local on a 5090.

Here’s what stood out right away:

  • The new MoE backbone actually feels smarter. You can tell it’s handling noise levels better mid-generation — especially on finer details.
  • Frame quality is sharper across the board. It’s subtle until you compare side-by-side, and then it’s very obvious.
  • Motion? Way smoother. Multi-object scenes don’t fall apart as easily. Parallax depth actually holds.
  • And the 5B hybrid model? It runs 720p @ 24fps with just 8GB VRAM using offload — perfect for prototyping without a cloud bill.

They also fed it a much bigger dataset — something like 65% more images and 83% more videos — and it shows.

If you’ve been using WAN for anything cinematic (or just want to), 2.2 finally makes that reliable. It still has its quirks, but there’s a clear jump in control, especially with camera language.

How WAN 2.2 Compares to 2.1 (Quick Overview)

You can dig into the release notes for the full list, but here’s the short version of the WAN 2.2 vs 2.1 upgrades — from someone who actually tested both:

FeatureWAN 2.1WAN 2.2
Core architectureDense diffusionMoE diffusion (expert hand-off mid-denoise)
Training dataSmaller baseline+65.6% images, +83.2% videos
Aesthetic tagsBasic labelsCinematic-level control: lighting, color, camera
Motion fidelityMiddlingBig upgrade — smoother motion, multi-object scenes hold up
Model lineup14B T2V, I2VAdds 5B TI2V hybrid that runs 720p locally

What does all this mean in practice?

  • You don’t have to fight the prompt system as much.
  • You get more cinematic outputs by default.
  • You can actually iterate without spinning up a beefy server.

Also, if you’re trying WAN 2.2 on ComfyUI, here’s a direct link to the WAN 2.2 TI2V 5B model on Hugging Face. That’s the one that’s been the most stable for local runs so far.

The Prompting Meta Has Changed

Prompting in WAN 2.2 isn’t about stuffing more words in — it’s about getting the structure right.

The model is way more assertive now. If you under-specify something, it doesn’t just leave it vague — it fills in its own “cinematic defaults.” Sometimes that looks great. Other times it decides to throw in lens flares or a weird zoom you never asked for.

So the goal is to guide it just enough.

What’s been working best is a structure that hits six things clearly — not bloated, just intentional. Around 80–120 words total.

Here’s how I’ve been thinking about it:

Shot Order

Always lead with what the camera sees first. Then what the camera does. Then what’s revealed. It’s not just about description — it’s a pacing thing. WAN 2.2 responds better when you keep the “visual sequence” in order.

Example:

“Wide shot of a lone figure standing in a wheat field at dusk. Camera cranes up slowly to reveal distant mountains under purple skies.”

Don’t jumble it like:
“There’s a sunset behind mountains and someone standing, the camera goes up.”
That gets interpreted in weird ways.

Camera Language

WAN 2.2 actually listens to these now. 2.1 ignored most of them unless you got lucky.

The verbs I’ve had the most success with:

  • pan left / pan right
  • tilt up / tilt down
  • dolly in / dolly out
  • orbital arc
  • crane up

Simple words, but they matter. And don’t stack too many together — WAN tends to favor the first movement and fade the rest.

Motion Modifiers

This part adds energy.

You can guide the pacing by dropping in speed adjectives like:

  • slow-motion
  • rapid whip-pan
  • time-lapse

And if you want parallax (foreground/background depth), use contrasts like:
“foreground grass rustles; background ruins stay still”
or
“fog drifts in foreground, city lights static in distance”

This helps avoid that flat “everything moves the same way” problem.

Aesthetic Tags

This is where WAN 2.2 really separates itself.

You can shape the whole mood with just a few well-placed terms — lighting, color, lens type, film emulation.

Stuff like:

  • Lighting: volumetric dusk, neon glow, harsh noon sun
  • Color grade: teal-and-orange, bleach-bypass, Kodak Portra
  • Lens/style: anamorphic bokeh, 16mm grain, CGI stylized

These get interpreted way more faithfully than before. You don’t need to overload them — just one or two usually shapes the look.

Temporal and Spatial Settings

Frame length matters more than it used to. WAN 2.2 tends to collapse detail or motion timing if the clip runs too long.

The sweet spot for most scenes:

  • Frame count: stick to 120 or less
  • Resolution: use 960×540 for quick testing; 1280×720 for clean output
  • Frame rate: 24 fps is cinematic and balanced — 16 fps works if you’re prototyping

Once you go past 5 seconds, things start to drift. Timing cues get mushy, and camera movements lose weight.

Negative Prompt (Yes, Still Matters)

One of the quiet upgrades in WAN 2.2 is how much better it respects negative prompts. Before, you’d throw in “bad anatomy” and still get mangled faces. Now it listens.

I’ve mostly stuck with the default negative prompt. It’s long, but it covers a lot:

“bright colors, overexposed, static, blurred details, subtitles, style, artwork, painting, picture, still, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, malformed limbs, fused fingers, still picture, cluttered background, three legs, many people in the background, walking backwards”

You can shorten it if you’re doing something stylized. But for realistic scenes, leaving it as-is avoids the usual weirdness.

What Worked — And What Didn’t — When Testing Prompts

I ran a bunch of test prompts through WAN 2.2 to see how well it handled shot structure, motion, and style. Some worked right away. Some… didn’t.

What’s interesting is how consistently it followed camera cues compared to 2.1. That alone is a huge upgrade.

Here’s what I saw:

Test 1: Cyberpunk Tracking Shot

Prompt used:

“A rainy night in a dense cyberpunk market, neon kanji signs flicker overhead. The camera starts shoulder-height behind a hooded courier, steadily tracking forward as he weaves through crowds of holographic umbrellas. Volumetric pink-blue backlight cuts through steam vents, puddles mirror the glow. Lens flare, shallow depth of field. Moody, Blade-Runner vibe.”

This one came out solid.

It respected the tracking shot. The lighting hit that “Blade Runner” glow with just the right mix of blues and pinks. The volumetric mist helped sell the depth, and reflections in the puddles actually worked — not always a given.

And unlike in 2.1, the camera followed the subject smoothly without jittering off into space halfway through.

So yeah — this kind of WAN 2.2 prompt is right in its wheelhouse.

Test 2: Alpine Pullback

Prompt used:

“Extreme close-up of a mountaineer’s ice axe biting into frozen rock. Camera dollies back and tilts up simultaneously, revealing the climber and a vast sunrise-lit alpine ridge behind him. Crisp morning air, golden rim-light, subtle lens flare.”

This one kind of flopped.

The expected pullback just didn’t happen — or at least not in a way that matched the prompt. WAN 2.2 seemed to struggle with syncing dolly + tilt in this case.

So while the lighting was decent and the subject framing was OK, the camera move didn’t land. Could just be the complexity of the combo movement — might need to separate them out next time.

Test 3: Orbital Slow Motion (Aquatic Shot)

Prompt used:

“An orca breaches in crystal-clear Arctic waters. Slow 360° orbital shot around the soaring whale as droplets hang suspended. Soft polar sunset lights the scene in pastel pinks and blues; cinemagraphic HDR.”

Parts of this worked.

It definitely respected the slow-motion — you could feel the weight of the orca mid-breach. The lighting was beautiful. But the orbital camera move? Totally ignored.

I think five seconds might have been too short to complete the rotation. Either that or WAN 2.2 just isn’t great with full 360 orbits yet. Still, the fact that the timing and light held together at all puts it miles ahead of 2.1.

Comparing WAN 2.2 to 2.1 on Motion Control

To get a clearer picture, I went back and reused a bunch of old prompts from my WAN 2.1 tests. The idea was to see how well 2.2 improved shot consistency — especially for camera verbs.

Here’s how that shook out:

Pan Left / Right

With WAN 2.1, getting the pan direction right was almost a coin flip. Sometimes you’d say “pan left” and it would pan right. Or halfway through the pan, it’d cut to a completely different angle.

Old prompt:

“A low angle shot of a jazz pianist in a dimly lit 1920s jazz bar, playing the piano with concentration. He wears a white shirt with suspenders and black trousers, his hands move rapidly on the keys. Camera pans left to low angle shot of a cute girl with pigtails and glasses playing the trumpet.”

On WAN 2.2? Same prompt. Changed “pan left” to “pan right.”

It just worked. First try.

No weird cut, no direction bug. Smooth pan to the trumpet player. That’s the kind of consistency you need when writing a WAN 2.2 prompt that has any kind of camera choreography.

Whip Pan

Fast pans — especially for transitions — were nearly impossible on 2.1. You could write “whip pan” and it’d either ignore you or break the scene mid-transition.

With WAN 2.2, I finally got something usable. It’s not perfect, but the scene doesn’t glitch out during the motion anymore. Still a bit soft on transition timing, though.

Pull Back

This was actually one of the few things WAN 2.1 could do fairly well.

Prompt used:

“Close up shot of the determined face of a battle-worn samurai. Camera pulls back to reveal him standing alone on a foggy battlefield, gripping his katana. Camera pulls back to reveal fallen warriors behind him. Wind whips through the trees, sending red autumn leaves swirling.”

In WAN 2.2, it’s even smoother. The leaves drifting mid-pullback look gorgeous, and the motion doesn’t jitter. A reminder that even if something worked in 2.1, it probably works better now.

More Motion Tests — What WAN 2.2 Can Actually Pull Off

There’s a lot of subtle camera work you just couldn’t rely on in WAN 2.1. You could try dolly-outs or tilts, but most of the time, they either didn’t register or came out looking wrong.

So I reran those same shots in WAN 2.2 — just to see how much better it got.

Dolly In / Out

WAN 2.1 was fine at dolly-ins. But dolly-outs? Total failure. Every prompt just defaulted back to dolly-in no matter how you phrased it.

With 2.2, the same exact prompt (just swapping “in” to “out”) worked on the first go.

Prompt used:

“In the style of an American drama promotional poster, Walter White sits in a metal folding chair wearing a yellow protective suit, with the words ‘Breaking Bad’ written in sans-serif English above him, surrounded by piles of dollar bills and blue plastic storage boxes. He wears glasses, staring forward, dressed in a yellow jumpsuit, with his hands resting on his knees, exuding a calm and confident demeanor. Camera dollies out. The background shows an abandoned, dim factory with light filtering through the windows. There’s a noticeable grainy texture.”

It pulled the camera back smoothly while holding center frame. Which is something 2.1 never managed, even with tons of prompt tweaking.

This kind of movement — slow, controlled dolly out — finally works like it should. And that’s a big deal for anyone writing cinematic WAN 2.2 prompts.

Tilt Up

Here’s another one that 2.1 just couldn’t do right. You could say “camera tilts up,” and the model would either jump to a cutaway or pan sideways.

Prompt used:
“A close-up shot of the feet of a man wearing mountaineering gear, standing in a grassy field. Camera slowly tilts up, revealing the full body of a mountaineer wearing gear. In the distance, majestic rocky mountains tower above.”

On WAN 2.2, this was basically perfect. The motion was subtle and steady — from boots to full frame, with just enough environmental context. It’s not flashy, but it’s exactly the kind of movement that sells realism.

Tracking Shot

WAN 2.1 already did tracking fairly well — at least when you gave it a subject to follow in motion. 2.2 improves that mostly by making the path smoother and adding more consistency in background elements.

Prompt used:
“A sprawling cyberpunk metropolis, neon lights reflecting off rain-soaked streets. Pedestrians in futuristic outfits rush by as holographic advertisements flicker in the air. The camera follows a hooded figure in a long tracking shot, weaving through the crowded market. Overhead lights cast a moody glow, while fog drifts through the alleyways. The scene is dark and mysterious, with blue and purple lighting creating a high-tech, dystopian feel.”

Still one of the most reliable styles to prompt — and 2.2 makes it feel way more intentional.

Crash Zoom

This was almost impossible in WAN 2.1. Any time I tried “crash zoom” or “rapid zoom,” it just glitched. Frames would skip, or the motion would turn into a jittery dissolve.

WAN 2.2 nailed it on the first attempt.

Prompt used:
“In a large dimly lit midcentury modern room, a man sits with an authoritative and pensive pose on a leather chair. He is wearing a dark suit jacket and grey trousers. He has silver hair. The chair is in the center of the screen. Behind the chair, there is an oak console with a lamp. The wall is made of oak panels. The man looks directly at the camera. Camera rapidly zooms in on the man’s face. Then he lets out a slight smirk.”

The whole sequence had that tight, intentional snap-zoom feel. And no frames broke. That’s a huge upgrade.

Camera Roll

This one surprised me.

Even after 10+ prompt tries, I couldn’t get 2.1 to do a proper camera roll. It would warp the shot, but the rotation effect never really landed.

WAN 2.2 got it instantly.

Prompt used:
“Overhead shot of a man fallen asleep on his desk in front of his computer. The room is dark except for the light from the monitor. The man’s head is on his arms by the keyboard. Around the desk, there is a mess of papers and floppy disks. The camera rolls in full 360 motion.”

It pulled off a complete rotation, and it actually felt cinematic — like something from a dream sequence or psychological thriller. Definitely the best surprise from this round of testing.

Running WAN 2.2 in ComfyUI (What Actually Works Locally)

If you’re using ComfyUI, WAN 2.2 is surprisingly easy to get running — especially if you’re just messing around locally. You don’t need massive VRAM if you’re okay with the hybrid model.

Here’s the breakdown:

The Models You Can Run

There are three models in the WAN 2.2 line:

Model TypeNameSizePurpose
Hybrid (TI2V)Wan2.2-TI2V-5B5BText+Image to Video — small enough to run on a 4060
Image-to-VideoWan2.2-I2V-A14B14B
High-quality from still image input
Text-to-VideoWan2.2-T2V-A14B14BText-only prompt input, cinematic framing support


The 5B hybrid version is the one I’ve been using for most prompt testing. It runs at 720p, uses a reasonable amount of VRAM (around 8 GB with offload), and gives you the flexibility to try both text-to-video and image-to-video in one workflow.

You can grab it here from the WAN 2.2 model repo on Hugging Face. That includes:

  • The wan2.2_ti2v_5B_fp16.safetensors model
  • A separate VAE (wan2.2_vae.safetensors)
  • And the text encoder: umt5_xxl_fp8_e4m3fn_scaled.safetensors

Tip for ComfyUI Users

If you’re used to using the WAN 2.1 workflow nodes, WAN 2.2 works almost identically — just swap in the newer models.

I did have to adjust some node timings though. The sweet spot seems to be around:

  • Resolution: 960×540 (for tests), 1280×720 (for final)
  • Frames per second: 24 fps looks great; 16 fps for quick iterations
  • Frame count: Anything up to 120 frames (about 5 seconds) is fine

Once you go over 5 seconds, motion quality starts to degrade — WAN 2.2 can’t really hold multi-subject choreography for long clips unless you baby it through every line of the prompt.

The one thing that did trip me up early was forgetting to adjust my WAN 2.2 prompt to match its default “cinematic” bias. If you under-describe something, it’ll make assumptions — and they aren’t always what you want.

Final Thoughts on WAN 2.2 Prompts (And What Actually Matters)

WAN 2.2 is a real step forward — not just on paper, but in the kind of stuff that actually affects your workflow.

Is it perfect? No. There are still weird spots where camera logic fails or a prompt gets interpreted too literally. But if you’re coming from 2.1, this version finally feels like you’re not fighting the tool.

And here’s the thing: most of the wins come from just writing better prompts.

Not longer ones. Just more intentional.

If you focus on:

  • camera verbs that make sense (dolly, tilt, pan)
  • lighting details (neon rim light, golden hour backlight)
  • and temporal limits (under 120 frames)

…the model does the rest. That’s where the new MoE backbone really shines — filling in the vibe without overriding your structure.

The cinematic defaults work better now, too. So if you do underspecify something, it still tries to carry the mood without breaking the scene. That wasn’t true in 2.1.

Also, the fact that you can get decent 720p clips at 24 fps on a single GPU — that’s not a small deal. Especially when paired with ComfyUI workflows that don’t need a backend server.

If you want to try it yourself, I’ve got the full ComfyUI workflow up — it works for both image and text-to-video, and it’s optimized for low VRAM setups. You can grab it here.

Share This Article
Follow:
Studied Computer Science. Passionate about AI, ComfyUI workflows, and hands-on learning through trial and error. Creator of AIStudyNow — sharing tested workflows, tutorials, and real-world experiments.
1 Comment

Leave a Reply to jonibabix Cancel reply

Your email address will not be published. Required fields are marked *