LTX-2.3 in ComfyUI Workflow

Esha Sharma
8 Min Read

High-quality local video is finally here. LTX-2.3 is a massive video engine. It runs directly on your computer. It completely replaces older digital tools. The new version brings a rebuilt VAE for sharper textures, native 9:16 portrait support, and perfect native audio generation.

Running this model inside ComfyUI is difficult. Users often fight empty JSON errors, out-of-memory crashes, and ugly grid artifacts. I solved these problems.

This guide gives you the exact files you need to download. I will show you the strict math rules for video sizing. I will also share the secret workflow tricks required to stop character faces from drifting. Let us build your local AI video studio right now.

The Required Model Files for LTX-2.3

To run LTX-2.3 properly, you must download the main diffusion model, the dual text encoders, the split VAE files, and the upscalers. Download these files.

The full model requires 32 gigabytes of VRAM. If your graphics card has less memory, you must download the FP8, FP4, or GGUF quantized models to avoid crashing.

Place these exact files into your ComfyUI folders:

(Safety Note: I have scanned all these files locally. They are safe to use.)

How to Configure ComfyUI Nodes and Math Rules

Update your software immediately. You must use ComfyUI version 0.16.1 or higher before you start. Go to your Manager. Search for “LTXVideo” and install the ComfyUI-LTXVideo custom nodes. Restart your software.

Place your models into their exact respective folders to stop the software from ignoring your files.

  • Put the main model in your models/checkpoints folder.
  • Move the Gemma 3 files and the text projection files into the models/text_encoders folder.
  • Place both VAE files into the models/vae folder.

Load your nodes. You must use the DualCLIPLoader node for this model. Select the Gemma 3 file in the first box. Select the LTX-2.3 text projection file in the second box.

Set your video size using strict math. Your width and height must divide exactly by 32. Do not type 1080 for Full HD. It fails the math test. Type 1088 instead.

Frame counts are equally strict. Use a multiple of 8, plus one. You must use exact numbers like 25, 121, 241, or 257 frames.

Adjust your sampler. If you run the heavy Dev model without the speed add-on, set your steps to exactly 20. Set your CFG scale to 3.0. Do you want to use the Distilled LoRA? Change your steps to exactly 8. Set your CFG to 1.0.

Expert Workflow Hacks to Stop Visual Artifacts

Do not trust the default workflow templates. They have major flaws. You must change them manually.

Fix the Image-to-Video drift: The standard ComfyUI workflow compresses your input image before generation. This destroys micro-textures. It causes character faces to drift instantly. Skip the downscale node entirely. Feed your reference image at its native resolution.

Use the FFLF method: Try the First Frame Last Frame method. You input a starting image and an ending image. The model interpolates the motion between them smoothly. This gives you exact camera control. It completely bypasses the need for heavy text prompts.

Stop the audio hallucinations: If you leave the audio prompt empty, the model will invent random background music. Negative prompts do not work here. You must explicitly command the audio. Type “silent, no sound” or “the steady hum of an air conditioner” to keep the track clean.

Re-inject the guide frame: When you pass your video to the spatial upscaler, use the LTXVAddGuideMulti node. Re-inject your original reference image. This forces the upscaler to preserve the exact face details.

Stabilize high-speed motion: LTX-2.3 fails at complex action scenes. Increase your frame rate to 50 FPS. This feeds the model more temporal tokens. It helps prevent objects from turning into rubber.

Troubleshooting Common Generation Errors

Generation errors usually stem from mismatched files or outdated nodes. Read the console. Check your files.

  • Ugly grid or net artifacts: Do you see a grid covering your video? You accidentally loaded the old LTX-2 upscaler. You must use the new ltx-2.3-spatial-upscaler-x2-1.0.safetensors file. Short prompts lacking detail also trigger this exact visual artifact.
  • min() iterable argument is empty: Your text encoder API version is wrong. You probably forgot to replace the old LTX-2 text encoder with the new LTX-2.3 text projection file inside the DualCLIPLoader node.
  • DualCLIPLoader: Expecting value: line 1 column 1 (char 0): Your text encoder download failed. Hugging Face often drops connections. This leaves you with an 86 KB file instead of the full 24 GB model. Delete the small file. Download it again.
  • End-of-generation crashes: Does ComfyUI crash at the very end? The VAE decode process consumes massive amounts of memory. Make sure you update your Git version of ComfyUI. The newest backend includes a massive Dynamic VRAM optimization feature enabled by default. Swap your standard VAE node for the VAEDecodeTiled node. This prevents sudden memory spikes and keeps your graphics card alive.
Share This Article
Studied Computer Science. Passionate about AI, ComfyUI workflows, and hands-on learning through trial and error. Creator of AIStudyNow — sharing tested workflows, tutorials, and real-world experiments. Dev.to and GitHub.
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *