Qwen Image GGUF Low VRAM ComfyUI Workflow (Text + Img2Img + Realism)

Run Qwen Image on low VRAM with GGUF. Lightning 4 and 8 steps, plus a small realism LoRA. Same seed. Same scene. Clear settings you can copy.

I keep this simple. First I explain what changed. Then I show what to download, where to place the files, and how to set the graph. After that I run the tests: baseline FP16, 8 steps, 4 steps, and a low VRAM Q4 pass. I also try the realism LoRA, a product shot, a game UI, and a quick image to image polish. If you want a quick primer on my ComfyUI flow, I have an older Qwen post you can skim on my site. You can also grab ComfyUI from GitHub and the model cards on Hugging Face if you are starting fresh.

What changed in this run

LightX2V shipped Lightning LoRAs for Qwen Image. The goal is speed while keeping text and layout stable.

Lightning 8 step: fast and still detailed
Lightning 4 step: very quick and good for low VRAM
A small realism LoRA: adds a natural look when you need faces to sit right

What you need for the Qwen Image GGUF Low VRAM Comfyui Workflow

Pick one Lightning LoRA. Only one should be active.

Optional realism add on:

flymy_realism.safetensors

Place LoRAs here, then restart ComfyUI:
ComfyUI/models/loras/

Base model

Use the quant that fits your GPU and RAM. Smaller Q runs faster but softens a bit. I tested Q8 and Q6. Speed was almost the same. Q6 was only 1 to 2 seconds faster and the quality looked very close. A friend ran Q4 on a 12 GB card and it was fine.

ou can grab the GGUF From Here variants

Start with Q8
If you hit a limit or want more speed, try Q6 or Q5
If you still need memory, go to Q4

Place the base here, then restart:
ComfyUI/models/diffusion_models/

[UPDATE: GGUF model files also go in ComfyUI/models/diffusion_models/. Use the GGUF UNet Loader in the graph and skip the full diffusion checkpoint.]

Encoders and VAE

Text encoder: qwen_2.5_vl_7b_fp8_scaled.safetensors
Place in ComfyUI/models/text_encoders/
VAE: qwen_image_vae.safetensors
Place in ComfyUI/models/vae/

Set the Workflow once

Mode switch in Model Group: 1 for Text to Image, 2 for Image to Image
Pick one loader: GGUF or safetensors
GGUF on, Load Diffusion Model bypassed
or Load Diffusion Model on, GGUF loader bypassed
Load CLIP: pick the Qwen text encoder
Load VAE: pick the Qwen VAE
Canvas: open Empty Latent Size and pick a preset
LoRAs: unbypass the one you need, bypass the rest
Prompt: short and specific
Sampler: set Steps and CFG

I lock the seed. I let the output follow the image size unless I say to rescale.

Baseline (FP16, no LoRA)

Model: full FP16
Steps / CFG: 20 / 4
Size: 1280×713

Prompt:

A coffee shop entrance features a chalkboard sign reading “Qwen Coffee 😊 $2 per cup,” with a neon light beside it displaying “AI STUDY NOW”. Next to it hangs a poster showing a beautiful woman, and beneath the poster is written “π≈3.1415926-53589793-23846264-33832795-02384197”.

I hit Generate. 1 minute 26 seconds.

The board says Qwen Coffee. “$2 per cup” is clean. The smiley sits near the price. The neon reads AI STUDY NOW on the left. The poster is sharp. The long pi line is readable. This is the baseline.

Lightning 8 step

LoRA: LightX 8 step
Steps / CFG: 8 / 1
Size: 1280×713

Same prompt. 18 seconds.

Board still reads Qwen Coffee with the smiley. “$2 per cup” is clear. The neon says AI STUDY NOW. The poster stays clean. The long pi line is complete. On close zoom the thinnest chalk edges are a little softer than FP16. The scene and layout match.

Lightning 4 step

LoRA: LightX 4 step
Steps / CFG: 4 / 1
Size: 1280×713

Same prompt. 8 seconds.

Qwen Coffee is readable. “$2 per cup” is clean. The smiley lands in the right spot. The neon glow looks good. The poster is sharp with natural skin tone. The long pi line stays clean and grouped the same way.

Low VRAM preset

Base: Q4
LoRA: 4 step
Steps / CFG: 4 / 1
Size: 1280×713

Same prompt. 8 seconds. No memory issues.

The board says Qwen Coffee with the smiley. “$2 per cup” is clear. The neon reads AI STUDY NOW. The poster is sharp. The long pi line shows up. Fine chalk texture is a bit softer than FP16, but most people will not notice.

Quick line to keep: for text on low VRAM, Q4 with 4 step works well.

Realism test

Model: BF16
LoRA: flymy_realism.safetensors
Steps / CFG: 50 / 5

I add the word “Realism” at the start of the prompt.

I hit Generate.

Skin and light feel natural. Beard edges are clear. The coat looks matte. A white earbud is visible. The café detail fits, and “AI STUDY NOW” is readable on glass. Depth feels right. Face sharp. Background softer.

I run the same prompt without the realism LoRA. It is good. With realism on at 50 and 5 it looks better. I also tried Lightning with realism. At 4 steps the skin looks plastic. At 8 steps it looks better than 4, but still not as natural as realism at 50.

Low VRAM check
I switch the base to Q4.

8 step Lightning with realism turns noisy
Q4 at 50 and 5 with realism is still unstable
Q8 at 50 and 5 with realism works again
Q4 without the realism LoRA runs, but the face looks a bit AI

Simple rule: for the best realism, use the realism LoRA alone at about 50 steps on Q8. No Lightning. It is slow. It is not a low VRAM trick.

Product shot

Base: Q4
LoRA: Lightning 4 step
Steps / CFG: 4 / 1
Size: 1920×1080

I hit Generate.

A long neck glass bottle with a pry off cap. A metallic label that shifts from deep blue to neon purple. AISTUDYNOW in big bold letters. A lightning graphic wraps the label. I can read Boost Mode and Hyper Charge. Heavy condensation on the glass. Ice at the base. Cool vapor in the air. Smooth bokeh behind it. It reads close to FP16, but runs light on VRAM.

Game UI

Base: Q4
LoRA: Lightning 4 step
Steps / CFG: 4 / 1
Size: 1920×1080

I hit Generate.

A driftwood logo with a skull and crossbones. A soft green eye glow. Three buttons in a vertical stack: PLAY, SETTINGS, QUIT. Rope and bits of seaweed on the wood. Light rust on the rivets. Distressed red letters with a small drip. Ash motes and a touch of sea mist. In the back, a moonlit bay with ships, rocks, and torn sails. The wood banner reads AI STUDY NOW and stays clear.

Quick image to image polish

Mode: Image to Image
Upload: your source in the Image Group
Empty Latent Size: bypassed
Prompt: short, usually the original prompt plus one small note

Denoise

Light polish: about 0.30 to 0.40. At 0.35 the layout holds. Textures sharpen. Small edges pop.

If denoise is below 0.30 you will barely see a change. If it is above 0.80 the scene can drift.

Bigger change: about 0.70. I add “poster shows a man”. Generate. The poster swaps to a man. The rest stays close.

Presets I keep

FP16 at 20 steps and CFG 4 for the cleanest edges
FP8 at 20 steps and CFG 2.5 as a daily balanced run
Lightning 8 step at CFG 1 when I want speed and the layout to hold
Q4 with Lightning 4 step at CFG 1 for low VRAM text and layout work
Realism LoRA alone at about 50 steps on Q8 for faces and light

Small notes and links

Restart ComfyUI after copying models so the files show up
Keep the output at the image size unless you need a resize
A short internal link if you want to skim my older build on Qwen and ComfyUI is on my site
You can grab the ComfyUI repo on GitHub and the Qwen model pages on Hugging Face

qwen adavanced new(aistudynow.com) (1)Download

Qwen Image GGUF Low VRAM ComfyUI Workflow (Text + Img2Img + Realism)

What changed in this run