How to Run LTX-2 on Low VRAM: The Complete GGUF Comfy UI Workflow

By Esha
7 Min Read

If you look at the official requirements for the new LTX-2 model, they say you need massive hardware to run it. They claim you need 24GB or even 40GB of VRAM to generate a full 1-minute video.

But I found a backdoor.

By using the specific Q4 GGUF compressed model and a hidden “Tiled VAE” trick, I can run this on almost any low VRAM consumer GPU (even a 12GB card).

I am going to show you the exact workflow I used to break the limit, how to fix the “Out of Memory” crashes, and how to use the Camera LoRA to make your shots look professional.

Files You Need To Download

Before we start, we need to get the “Lite” versions of these models. I am not using the official massive files. I am using the GGUF versions by Kijai, a popular wrapper creator.

Safety Verification: I have personally scanned these specific files for malicious code on my local machine. They are verified safe versions from the official HuggingFace repository.

1. The Main Model (GGUF)

  • Download LTX-2 Distilled Q4 GGUF
    • Context: This is the 12GB compressed version. Since it is “Distilled,” you do not need an extra LoRA file; it is built-in.

2. The Text Encoders (You Need Both)

3. The VAE Files

Critical Step: Update Your GGUF Nodes

Before you try to load this workflow, there is one critical step you must do, or ComfyUI will crash immediately.

You have to update your ComfyUI-GGUF nodes.

The old version does not support the LTX-2 architecture.

  1. Open ComfyUI Manager.
  2. Click “Update All.”
  3. Restart ComfyUI.

If you don’t do this, the loader will turn red and fail to load the Q4 model.

The Best Settings for Low VRAM

I started this test by comparing the official FP8 model (21GB) vs the Q4 GGUF model (12GB). When I tried the official model, my memory filled up instantly and everything stopped. When I switched to Q4, it ran perfectly.

Here are the specific settings you need to change to avoid crashing:

1. The Resolution Rule (Divisible by 32)

Most people try to run 1280×720. This is wrong. LTX-2 requires the resolution to be divisible by 32.

  • Bad: 1280 x 720
  • Good: 1280 x 736
  • Low VRAM Mode: 640 x 360

If you have a 12GB card, start with 640×360. You can upscale it later.

2. The Frame Count Rule (8n+1)

You cannot just pick random frames. The model follows a strict math rule: (Divisible by 8) + 1.

  • 10 Seconds = 257 Frames
  • 5 Seconds = 129 Frames

In my workflow, I added a node that calculates this automatically based on seconds, so you don’t have to do the math.

3. Steps and FPS

Since we are using the Distilled model, you do not need 30 steps.

  • Steps: 8 (This makes it incredibly fast).
  • CFG: 1.0 (Do not go higher).
  • FPS: 25 (The model was trained on 50fps, but 25 is smoother for playback).

The VAE Secret to Fix Crashing

If you try to generate a full 1-minute video, your computer will probably crash at 99% during the “Decode” phase.

I found the fix. It wasn’t the model—it was the VAE.

I set up Two VAE Decoders in my workflow:

  1. Standard Decoder: Use this for short 5-second previews.
  2. Tiled VAE Decoder: Use this for the full video.

The normal decoder tries to save the whole video to RAM at once. The Tiled Decoder breaks it into small pieces. This is the only way to save a long video on a consumer card without a blue screen.

How to Use Camera LoRAs (The “Dolly In” Trick)

I created a scene with a killer holding an axe to test the motion.

  • Without LoRA: The character walks, but the camera feels static and boring.
  • With “Dolly In” LoRA: The camera moves forward from behind the character. It looks cinematic.

How to set it up:

  1. Load the LTX Camera LoRA.
  2. Select “Dolly In”.
  3. Set strength to 0.8.

Warning: I found that the model struggles when the camera enters a completely new room (like going through a door). It creates a “hallucination” effect. Keep your shots in one environment for the best results.

Troubleshooting: The “Enhancer” Trap

If you look at the official LTX workflow, you will see a section called “Enhance Prompt”. Delete it.

If you use the built-in Enhancer node, you will get an Out Of Memory (OOM) error because it requires an extra text encoder that eats 8GB of VRAM. The Fix: Just use ChatGPT or Gemini to rewrite your prompt manually before pasting it into ComfyUI. It saves you 8GB of VRAM.

Conclusion

This workflow changes everything for low-end cards. I was able to generate a full 60-second video with consistent details on a standard GPU just by using the Q4 GGUF model and the Tiled VAE trick.

Give it a try and let me know in the comments if you manage to break the 1-minute limit too.

Share This Article
Follow:
Studied Computer Science. Passionate about AI, ComfyUI workflows, and hands-on learning through trial and error. Creator of AIStudyNow — sharing tested workflows, tutorials, and real-world experiments. Dev.to and GitHub.
3 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *