I’ve been trying one long-video generation workflow after another, and when I discovered LongCat-Video I thought: this might actually work for minutes-long clips instead of short loops. The idea of using it inside ComfyUI makes it more accessible. I’m doing this so you can try it too.
So in this article I’ll show you:
- What LongCat-Video is and why it matters
- How to download and set it up in ComfyUI
- My test workflow and settings for low-VRAM GPUs
What is LongCat-Video?
The model comes from Meituan’s LongCat team. According to the official docs it has 13.6 billion parameters and supports three key tasks: text-to-video (T2V), image-to-video (I2V), and video-continuation (extending an existing clip).
What makes it stand out:
- It’s built to do minutes-long videos without big issues like color drift or motion breaks.
- It uses a coarse-to-fine generation strategy along time and space, plus “Block Sparse Attention” to help efficiency.
- It’s open-source (MIT license) so you can download, tweak and use it.
So if you’ve been stuck at 5-10 sec clips, this model gives you a chance to push to 30-sec, 1-minute or even more.
Downloading and Setup in ComfyUI
First, I’ll show you how I set up the LongCat Video workflow inside ComfyUI.
It’s simple. You can follow the same steps and get the same result even on a low-VRAM GPU.
Files You Need
Go to the Kijai LongCat-Video Comfy page on Hugging Face.
You’ll see three main model files there.
- LongCat_TI2V_comfy_fp8_e4m3fn_scaled_KJ.safetensors – around 15.5 GB. Best if your GPU has 12 to 16 GB VRAM.
- LongCat_TI2V_comfy_bf16.safetensors – about 27 GB. Use this if you have a 24 GB card or higher.
- LongCat_refinement_lora_rank128_bf16.safetensors – this LoRA improves frame-to-frame consistency and sharpness.
There’s also a smaller LongCat_distill_lora_alpha64_bf16.safetensors file.
With the Distill LoRA you can cut the number of steps almost in half. It makes generation much faster.
You’ll also need ComfyUI installed.
If you don’t have it yet, download the latest build from GitHub and run the Manager once to fetch the base nodes.
Make sure your Python and Torch versions are current.
If you have an NVIDIA GPU, install FlashAttention 2. It helps with smooth memory use during sampling.
Placing the Files
After I downloaded everything, I opened my ComfyUI folder.
Inside there are a folders you’ll use.
I dropped the LongCat model files inside models/diffusion_models.
Then I placed both LoRA files in models/lora.
Once they were in, I restarted ComfyUI so it could read the new models.
You’ll see them load automatically when the app starts.
Setting up the Workflow
When ComfyUI opened, I loaded my workflow.
In my setup, I picked the model LongCat_TI2V_comfy_fp8_e4m3fn_scaled_KJ.safetensors from the list.
For the VAE, I used wan2.2_vae.safetensors.
My test run (Low VRAM GPU)
I tried this on VRAM GPU(12–16 GB) to show it works even when you don’t have a high End GPU.
- I picked a starting image: “a woman sits at a wooden table by the window in a cozy café”
- I used around 12 steps for this test (instead of the full 50 steps) to save memory and time
- For resolution I used 640×604 (since the original image was 2048×2048, and I wanted manageable)
- CFG scale = 1
Result: I got about 93 frames at 15 fps, showing the woman picking up the cup.
Then I added another prompt: “An orange cat jumps onto the table”
and extended it. The cat walks in, the woman reacts. Motion stays smooth, same lighting, no weird color shifts.


