I was just browsing around for workflow ideas when I saw something odd — a new model called HiDream-E1.1.
It’s supposed to be an image editing model, kind of like Kontext Flux.
But one thing instantly caught my attention: no resolution cap.
If you’ve used the older version before, you know exactly what I’m talking about.
You couldn’t go past 786×786 pixels — it was hardcoded.
This new one? No such limit.
That alone made me pause.
So I clicked through to the Hugging Face page — and that’s where things got even more interesting.
It only had about 100 downloads. Barely anyone was talking about it.
Naturally, I had to test it.
I opened ComfyUI and started building a new workflow from scratch.
Same setup I always go with:
- Model group
- Image upload
- Resize Image (I use the v2 node)
- Sampling group and all the required sampling nodes
Once everything was wired together properly, it was finally time to run it.
And I’ll be honest — I was curious to see if this model would actually hold up.
What You’ll Need: The Full File List
Before you even run this, here’s everything you’ll want in place.
Main Model File
You’ll need the HiDream E1 Full BH16 model.
If you’re low on VRAM, there’s also an FP8 version — both are up on Hugging Face. You can grab them here:
HiDream-E1-1 (BF16) or the GGUF versions from ND911.
Drop either one into your ComfyUI/models/diffusion_models/
folder.
There’s also a build from gorillaframeai that includes Q2 to Q8 quantizations.
Lower Q = less VRAM. Higher Q = better results.
VAE File
You’ll also need the corresponding VAE.
This one’s hosted under the official HiDream-ai page — save that to your models/vae/
folder.
Text Encoder Files
This part is a little more involved.
You’ll need all four:
- CLIP I
- CLIP G
- LLaMA 3.1_8B Instruct FP8
- T5XXL FS8
They’re available in both .safetensors
and .gguf
formats.
If you’re running low VRAM setups, GGUF might be your better bet — and you can get everything from lodestones’ text encoder repo.
The Meta LLaMA 3.1 GGUF version is up here:
bartowski/Meta-Llama-3.1-8B-Instruct-GGUF
Yeah, it’s a few files — but once they’re in place, the model runs just fine.
Built-In GGUF Support (If You Need It)
A lot of users — especially on lower VRAM setups — prefer GGUF models over safetensors.
So I added GGUF support directly into the workflow.
Here’s how to switch it:
Main Model
In the Model Loader group:
Bypass the standard Load Diffusion Model node.
Un-bypass the GGUF Loader (Condenser) node.
Point it to the HiDream-E1.1 model.
Text Encoder
Same idea:
Bypass the default Quadruple CLIP Loader.
Un-bypass the Quadruple GGUF Loader.
Connect everything to the Set Hi-Dream CLIP input nodes.
Done. That’s all it takes to switch formats.
Now it was time to test if this thing actually worked.
Checking Resolution (And a Small Trick to See If You’re Over 1M Pixels)
Before running the workflow, I wanted to make sure the image size wouldn’t break anything.
The model’s documentation says it supports “1M pixel” images.
But let’s be real — there’s no way to guess an image’s pixel count just by looking at it.
So I added a small fix to the workflow:
A Math Expression node.
All I did was connect the image width and height outputs into it.
Then, inside the formula field, I used a * b
— which gives you the total pixel count.
Now every time you upload an image, you instantly see the actual resolution — no guessing, no math.
Simple tweak. Big time-saver.
Trying to Fix the Slowness
I figured maybe it wasn’t just the model.
Maybe it was my setup.
So I made a few changes:
- Updated ComfyUI to the latest build
- Swapped the image for a smaller version (resized to 600 × 877)
- Checked pixel count again — now down to around 500K pixels
Ran the workflow again…
This time, it completed in 1 minute and 30 seconds.
Huge improvement.
But here’s the catch:
Even though it was faster, the color quality was still weird.
The dress turned red — but it looked oversaturated and kind of fake.
Not natural. Not subtle. Just… off.
So yeah, faster is nice — but I wasn’t happy with how it looked.
Side-by-Side With Kontext Flux
Of course, I had to give the same prompt to Kontext Flux.
It’s been my go-to model for a while now.
Same image. Same scene.
And the bullets? Yeah, they were gone too.
But here’s the difference:
Neo’s hand was slightly off.
The pose shifted just a bit — even though I hadn’t asked for that.
Quick clarification here:
In my first test with HiDream-E1.1, I included the line “leaving only Neo in his power pose” — just to be specific.
So for fairness, I removed that line when testing both models again.
Even with that change, Kontext Flux handled the cleanup better.
It erased the bullets and kept the body pose intact.
More precise. More faithful.
Dress Test Redux: Kontext Wins Again
To really be sure, I went back to the original blue dress image.
Ran it through Kontext Flux.
Prompt stayed the same —
Change the color of the woman’s dress to red.
The result?
Almost perfect.
The color shift was subtle and natural.
You’d never know it wasn’t the original photo.
Even the transparent sections of the dress — those tricky sheer areas — were preserved without glitches.
No artifacts. No weird overlays. Just a smooth transition from blue to red.
Zoom in, and everything still holds up.
It felt like the kind of output you’d actually use in production — not just a demo.
Final Thoughts
After all that, I don’t think I need to run more tests.
The difference between HiDream-E1.1 and Kontext Flux is pretty clear —
HiDream has potential, but still needs polish.
Kontext is just more consistent, especially when accuracy matters.
Both workflows are linked in the description if you want to try them yourself.
Kontext flux Workflow: https://aistudynow.com/flux-1-kontext-comfyui-workflow-low-vram-setup-with-cache-lora/