ComfyUI LongCat Workflow: Make Very Long Video on Low VRAM

I’ve been trying one long-video generation workflow after another, and when I discovered LongCat-Video I thought: this might actually work for minutes-long clips instead of short loops. The idea of using it inside ComfyUI makes it more accessible. I’m doing this so you can try it too.

So in this article I’ll show you:

  • What LongCat-Video is and why it matters
  • How to download and set it up in ComfyUI
  • My test workflow and settings for low-VRAM GPUs

What is LongCat-Video?

The model comes from Meituan’s LongCat team. According to the official docs it has 13.6 billion parameters and supports three key tasks: text-to-video (T2V), image-to-video (I2V), and video-continuation (extending an existing clip).

What makes it stand out:

  • It’s built to do minutes-long videos without big issues like color drift or motion breaks.
  • It uses a coarse-to-fine generation strategy along time and space, plus “Block Sparse Attention” to help efficiency.
  • It’s open-source (MIT license) so you can download, tweak and use it.

So if you’ve been stuck at 5-10 sec clips, this model gives you a chance to push to 30-sec, 1-minute or even more.

Downloading and Setup in ComfyUI

First, I’ll show you how I set up the LongCat Video workflow inside ComfyUI.
It’s simple. You can follow the same steps and get the same result even on a low-VRAM GPU.

Files You Need

Go to the Kijai LongCat-Video Comfy page on Hugging Face.
You’ll see three main model files there.

You’ll also need ComfyUI installed.
If you don’t have it yet, download the latest build from GitHub and run the Manager once to fetch the base nodes.
Make sure your Python and Torch versions are current.
If you have an NVIDIA GPU, install FlashAttention 2. It helps with smooth memory use during sampling.

Placing the Files

After I downloaded everything, I opened my ComfyUI folder.
Inside there are a folders you’ll use.

I dropped the LongCat model files inside models/diffusion_models.
Then I placed both LoRA files in models/lora.

Once they were in, I restarted ComfyUI so it could read the new models.
You’ll see them load automatically when the app starts.

Setting up the Workflow

When ComfyUI opened, I loaded my workflow.

In my setup, I picked the model LongCat_TI2V_comfy_fp8_e4m3fn_scaled_KJ.safetensors from the list.

For the VAE, I used wan2.2_vae.safetensors.

My test run (Low VRAM GPU)

I tried this on VRAM GPU(12–16 GB) to show it works even when you don’t have a high End GPU.

  • I picked a starting image: “a woman sits at a wooden table by the window in a cozy café”
  • I used around 12 steps for this test (instead of the full 50 steps) to save memory and time
  • For resolution I used 640×604 (since the original image was 2048×2048, and I wanted manageable)
  • CFG scale = 1

Result: I got about 93 frames at 15 fps, showing the woman picking up the cup.

Then I added another prompt: “An orange cat jumps onto the table”

and extended it. The cat walks in, the woman reacts. Motion stays smooth, same lighting, no weird color shifts.

By Esha

Studied Computer Science. Passionate about AI, ComfyUI workflows, and hands-on learning through trial and error. Creator of AIStudyNow — sharing tested workflows, tutorials, and real-world experiments.