How to Run LTX-2 on Low VRAM: The Complete GGUF Comfy UI Workflow

If you are looking at a full one minute video generated on a standard consumer GPU you might think it is impossible.

According to the official requirements you shouldn’t be able to do this. They say you need massive hardware to run the new LTX-2 model.

But I found a backdoor.

By using a specific compressed model and a hidden VAE trick I can run this on almost any low VRAM GPU.

I am going to show you the exact workflow I used to break the limit. You will also learn how the camera LoRA works to make your shots look professional.

The Comparison: FP8 vs Q4

I created my first video using the massive 21 gigabyte FP8 Distilled model. It worked but it was heavy.

So now I am going to try the Q4 GGUF model which is nearly half the size.

I started this test by comparing the two main options side by side.

First I tried the official FP8 Distilled model. It is twenty one gigabytes. That is huge. As soon as I hit run my memory filled up and everything stopped.

So I switched to the GGUF version. Specifically the Q4 model. It compresses the file down to just twelve gigabytes.

Critical Step: You Have to Update ComfyUI

But before you try this there is one critical step you have to do.

You have to update your ComfyUI GGUF Nodes.

If you don’t update the node it won’t support LTX-2 and it will fail to load.

I made sure to update mine first. Then I switched the loader to GGUF selected the Q4 model and restarted ComfyUI to clear the VRAM.

I hit run.

The result?

I put them side by side and looked closely. It looks exactly the same. The Q4 model gives a similar result to the FP8 version.

If you are on low VRAM or you want to generate faster this is the best way to run it.

The Actual Story Scenes I Created

First let’s look at the actual story scenes I created with this workflow so you can see the quality.

Scene One: The Killer

He has an axe in his hand and moves forward to the door. The walking animation looks real.

But notice what happens next. When the axe hits the door it looks fake. The physics fails.

I found that the model is perfect for walking and talking but it still struggles with heavy impact.

However the final reaction is great. The girl says “Stay away” and backs up. Her hands look correct and the movement is perfect.

Scene Two: The Ghost Sequence

Now let’s look at the Ghost scene. This is a whole continuous scene I just created.

You can see it has passed 12 seconds and everything looks good. At 15 seconds the animation is still stable.

Up to the 22 second mark the camera is behind the person because I am using the Dolly In LoRA here.

Now after 20 seconds the Ghost knocks on the door. This is a new image and scene I injected here.

For this specific shot where the woman hears the knock I switched to the Static LoRA.

Then I used Dolly In again when the woman speaks to her and when the door opens and the ghost is there.

So this is how I created this scene. I went from Dolly In to Static and then back to Dolly In.

The VAE Secret to Fix Crashing

Now if you try to generate a long video like this your computer will probably crash at 99%.

I found the fix. It wasn’t the model. It was the VAE.

I set up two VAE Decoders.

The first is standard but the second is a Tiled VAE Decoder.

The normal decoder tries to save the whole video at once and crashes your RAM. The Tiled Decoder breaks it into small pieces. This is the only way to save the video without crashing.

How I Broke the 1 Minute Limit

Now that we have the lighter model and the Tiled VAE I decided to break the limit.

I set the length to 50 seconds.

With a 32GB GPU this is usually hard. So I used a trick. I lowered the resolution to 640 by 360.

Because I am using the Distilled model at 8 steps it generated incredibly fast. It took only one minute of real time to generate a 50 second video.

And look at the result. It is insane. There is no color drift.

But 360p is a bit low quality. So I switched to Image to Video mode. I uploaded a high quality starter image and set the length to a full one minute.

Again it worked perfectly. I got a full one minute video with consistent details.

File You Need To Download

Now let’s understand the workflow properly. This time I made the workflow much easier to use. I am not using the main official model. I am using the version by Kijai who is a popular wrapper creator.

Here is the exact list of files you need to place in your ComfyUI folders.

  • The Main Model that I used the Distilled Q4 GGUF model by Kijai. Since it is the distilled version you do not need to download an extra LoRA file. It is already built in.

https://huggingface.co/Kijai/LTXV2_comfy/tree/main/diffusion_models

  • The VAE Files You need two specific VAE files to make this work. The first one is LTX Audio VAE

https://huggingface.co/Kijai/LTXV2_comfy/tree/main/VAE

  • BF16 and this goes inside your models/checkpoints folder. The second one is LTX Video VAE BF16 and you must put this one inside your models/VAE folder.
  • The Text Encoders For the Dual Clip Loader you need two files. Download Gamma 3.1 2B Safetensors and LTX 2 90B Distilled BF16. Both of the files go inside your models/text_encoder folder.

https://huggingface.co/Kijai/LTXV2_comfy/tree/main/text_encoders

  • The Upscalers If you want to use the low resolution trick you need the upscaler models. Get the Spatial Upscaler X2 if you want to increase resolution. Get the Temporal Upscaler if you want to increase your FPS. Both of these files go inside your models/LatentUpscaleModel folder.

https://huggingface.co/Lightricks/LTX-2/tree/main

File Locations

Let’s place the files correctly.

There are two VAE files. You need LTX Video VAE BF16 and LTX Audio VAE BF16.

Make sure the Audio file goes inside the models/checkpoints folder. The Video VAE BF16 goes inside the models/VAE folder.

Now talk about the Dual Clip Loader. There are two files. Gamma 3.1 2B Safetensors and LTX 2 90B Distilled BF16.

These two files go inside your models/text_encoder folder. You will see two versions Distilled and Without Distilled. I just go with Distilled

The Enhancer Warning

If you looking at the official workflow released by LTX you will find an Enhance section.

If you use these nodes you will get an Out of Memory error. You need 96GB of RAM or more to run this.

So skip this part. I Have removed the Enhancer part from my workflow. Just copy the prompt line and use Gemini or ChatGPT to enhance your prompt.

Resolution and Upscaling Settings

First of all look at the Resolution.

Most of you might choose 1280 by 720. This is the wrong resolution.

Make sure that the resolution is divisible by 32. So 1280 by 736 is the best resolution for high quality images. Whatever resolution you have selected it must be divided by 32.

For low VRAM user you can go with 640 by 360. High end users can go with 1280 by 736.

Then you end of the video using an Upscale Model.

If you want the double resolution use the Spatial Upscaler X2.

But if you want to double your FPS like turning a 25fps video into 50fps and you don’t want to change the resolution you have to use a different upscaler called the Temporal Upscaler.

So remember Spatial is for Resolution and Temporal is for FPS. Both of these model files go inside your models/LatentUpscaleModel folder.

FPS and Steps Settings

Next is the Video Length. I make this very easy.

You just add your length in 20 seconds and it’s automatically calculates the frames. You don’t need to write the frames manually just type the seconds.

Another thing I just use 25 FPS instead of 24. This model is trained on 50 FPS so I will recommend that using 25 FPS. But you can go with the 24.

For Steps and Sampler I use the Distilled Model so 8 steps is best. The CFG is set to 1.

For the Sampler I Generally use Euler for speed. If you want more detail you can use RSA_2M which is DPM++ 2M. This will add a little more detail to your image.

How the Camera LoRA Works

Finally let me talk about the Camera LoRA. I have created a scene so you can see the difference.

First here in the video without any camera input. It keeps going but there is no camera movement. It feels static.

So how you can use the camera here?

You just have to select the Dolly In camera LoRA. I have selected Dolly In. It moves from the backside and makes the scene more natural.

I hit a run. Now you can see how good it looks. When they are running the camera moves with them. It stays at the backside. This is how you properly use the camera.

But I found a limitation. When the camera changes and enters a completely new room the model struggles to create the new environment.

This workflow really changes everything for low end cards. Give it a try and see if you can break the 1 minute limit too.

Workflow

By Esha

Studied Computer Science. Passionate about AI, ComfyUI workflows, and hands-on learning through trial and error. Creator of AIStudyNow — sharing tested workflows, tutorials, and real-world experiments.