How to Run LTX 2.3 on an 8GB GPU Without OOM Errors

Esha Sharma
7 Min Read

Running LTX 2.3 in ComfyUI usually causes an Out of Memory (OOM) error on an 8GB GPU. However, by changing how ComfyUI loads and processes your models, you can safely keep your VRAM under 8GB.

I have tested this exact workflow using a live performance monitor. It allows you to generate a full video in about one minute while keeping your VRAM usage safe.

Here is how you set it up.

Download sources

FileSource
ltx-2.3-22b-distilled-1.1-Q4_K_M.ggufUnsloth LTX 2.3 GGUF distilled 1.1 Q4_K_M, about 14.2GB.
gemma-3-12b-it-qat-UD-Q3_K_XL.ggufUnsloth Gemma 3 12B QAT GGUF Q3, about 6.14GB.
ltx-2.3_text_projection_bf16.safetensorsKijai LTX2.3 Comfy text projection, about 2.31GB.
LTX23_video_vae_bf16.safetensorsKijai LTX2.3 Comfy VAE folder, video VAE about 1.45GB.
LTX23_audio_vae_bf16.safetensorsKijai LTX2.3 Comfy VAE folder, audio VAE about 365MB.
ltx-2.3-spatial-upscaler-x2-1.1.safetensorsLightricks LTX 2.3 spatial upscaler x2 v1.1, about 996MB.

Exact folder structure

Use this:

ComfyUI/
└── models/
├── unet/
│ └── ltx-2.3-22b-distilled-1.1-Q4_K_M.gguf

├── text_encoders/
│ ├── gemma-3-12b-it-qat-UD-Q3_K_XL.gguf
│ └── ltx-2.3_text_projection_bf16.safetensors

├── vae/
│ └── ltx/
│ ├── LTX23_video_vae_bf16.safetensors
│ └── LTX23_audio_vae_bf16.safetensors

└── latent_upscale_models/
└── ltx-2.3-spatial-upscaler-x2-1.1.safetensors

The Essential ComfyUI Startup Commands

Before you load any models, you must force ComfyUI into an aggressive memory-saving mode. If you skip this step, an 8GB GPU will crash immediately.

If you use the portable version of ComfyUI, right-click your run_nvidia_gpu.bat file and click edit. You need to paste these exact commands right after main.py:

  • –lowvram: This is your most important command. It tells ComfyUI to manage your limited video memory carefully.
  • –reservevram 1 (or 2): This keeps 1 to 2 gigabytes of memory free for Windows. This step is designed to help prevent your entire computer from crashing. (Note: If you have a larger card and want to simulate an 8GB limit for testing, you can increase this number. For example, I tested this on a 32GB card using –reserve-vram 24 to leave exactly 8GB for ComfyUI.)
  • –cache-lru 10: This limits your cache memory usage to prevent unnecessary spikes.
  • –preview-method taesd: This changes your live preview to a very lightweight version. If you still get an Out of Memory error later on, you can change this to --preview-method none to save even more memory
python main.py --lowvram --reserve-vram 1 --cache-lru 10 --preview-method taesd

The Best Models for a Low VRAM Workflow

You must choose the right model files to save memory. A standard setup will crash an 8GB card, so you need compressed versions.

Use the Distilled Q4_K_M Model First, use the distilled 1.1 Q4_K_M model. This saves a massive amount of VRAM because you do not need to load an extra LoRA file.

Compress with the Gemma 3 GGUF File Next, move to the DualCLIPLoader. Load the Gemma 3 GGUF file. Using the GGUF version compresses a massive AI model down to just 6.14 gigabytes. This is exactly how we force the heavy text encoder to fit inside low VRAM.

You will also need the LTX 2.3 text projection file (2.31 gigabytes) and two VAE files.

Required Low VRAM Patches for ComfyUI

Even with compressed models, you need to manage your memory while the video is actively generating.

You should add a “Low VRAM Patches” group to your ComfyUI workflow. Take your loaded model and pass it through these three patches:

  1. Sage Attention
  2. Memory-Efficient Attention
  3. Chunk Feed-Forward

These nodes reduce your peak memory usage. Because we saved so much memory, you should also add the LTX NAG node to stop the AI from generating bad anatomy.

The Two-Pass Generation Trick (The Secret to 8GB)

This is the most important step. I divide the video generation into two separate passes. If you try to do it all at once, you risk a crash.

Pass 1: Safe Base Generation Bypass the second pass and the upscaler. Set your first pass to do the first six steps.

When you hit run, the text encoder takes about 3.7GB of VRAM, and the prompt section takes about 5.6GB. During the first pass, your VRAM will spike to exactly 8.1GB. This takes about 44 seconds. After it finishes, your VRAM will drop safely back down to around 3.6GB.

Pass 2: Finishing the Video Now, enable the second pass and hit run again.

Because you used the correct cache settings, ComfyUI will not start from the beginning. It starts exactly where the first pass ended. The second pass runs two more steps. It will take around 8GB of VRAM again and finish in about 15 seconds.

How to Upscale Safely Without Crashing

If you want a higher resolution, do not use the built-in ComfyUI upscale section. I tested this, and the VRAM spikes to around 14GB. While it will not fail, it takes a very long time on an 8GB GPU.

Instead, use this better method:

  1. Download your generated LTX 2.3 video.
  2. Load that video back into a fresh ComfyUI workflow.
  3. Use the ‘RTX Video Super Resolution’ node.

This keeps your system fast and prevents your VRAM from overloading.

Bonus Tip: Use –preview-method none

If you still get an Out of Memory error during your generations, you can apply a simple command line trick.

Open your run_nvidia_gpu.bat file, click edit, and paste --preview-method none after your main.py command. This stops ComfyUI from rendering live previews, which saves even more video memory.

Share This Article
Studied Computer Science. Passionate about AI, ComfyUI workflows, and hands-on learning through trial and error. Creator of AIStudyNow — sharing tested workflows, tutorials, and real-world experiments. Dev.to and GitHub.
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *