I Tried to Max Out an RTX 3090 with LTX-2.3’s Full Dev Model — It Crashed 4 Times in a Row

Leer en español →

You get a 24GB GPU sitting on your desk and the urge to push it harder is almost impossible to resist. After successfully generating a 10-second video with LTXV-2.3’s distilled model in my previous test, I swapped in the full dev model—the higher-quality variant Lightricks ships alongside the distilled checkpoint. I expected slower performance, sure, but a successful generation. Instead I got four consecutive, perfectly reproducible crashes using the LTX-2.3 dev model, all failing at the exact same step.

The real story isn’t about hitting GPU limits though. It’s about something most people completely overlook when they talk about VRAM constraints: system RAM crash patterns that emerge when loading via GGUF quantization. The GPU had headroom every single time. The CPU side? That’s where things fell apart.

This article walks through every crash, every variable I tested, and the diagnostic pattern that emerged. If you’re planning to run LTX-2.3 on similar hardware, this is essential reading.

At a Glance: Test Summary

Test	Model	Quantization	CFG	VAE Tile Size	Result	Failure Point
Baseline (Success)	Distilled	.safetensors	1.0	384 / 4096	✅ Completed	N/A
Attempt 1	Dev	Q8_0	3.0	384 / 4096	❌ OOM Kill	Video VAE load
Attempt 2	Dev	Q6_K	3.0	384 / 4096	❌ OOM Kill	Video VAE load
Attempt 3	Dev	Q6_K	3.0	256 / 64	❌ OOM Kill	Video VAE load
Attempt 4	Dev	Q6_K	1.0	384 / 4096	❌ OOM Kill	Video VAE load

The Setup: Same Machine, Same Prompt, Different Model

Everything stayed constant except the model file itself:

GPU: RTX 3090 (24,122 MB VRAM reported by ComfyUI)
System RAM: 32GB total (32,014 MB reported)
ComfyUI version: v0.27.0
Pinned memory: 28,812 MB (ComfyUI reserves this upfront for async weight streaming)

Here’s the critical detail: after ComfyUI started up, only 3–4GB of system RAM remained freely available before any of these tests even began.

For baseline comparison, the distilled transformer (ltx-2.3-22b-distilled-1.1_transformer_only_mxfp8_block32.safetensors) loaded without issue via DiffusionModelLoaderKJ, staged 22,914 MB into VRAM, and finished a full 10-second video generation (8 base steps + 3 refinement steps) in 463.39 seconds with zero memory problems.

The dev model tests all used the same prompt (a boxing-gym scene), same seed, same graph structure—only the model loader node changed from DiffusionModelLoaderKJ (native .safetensors) to UnetLoaderGGUF (GGUF quantized files).

👉 Quick takeaway: The distilled model worked flawlessly on this hardware; the dev model crashed consistently at the same point, suggesting the issue is tied to how GGUF files are loaded and dequantized, not raw model size.

Attempt 1: Q8_0 Quantization with Full CFG

Model: ltx-2.3-22b-dev-Q8_0.gguf (22.75 GB on disk)
Settings: 20 base steps, cfg=3.0 (base and refinement), 3 refinement steps

The Q8_0 file didn’t fit entirely in VRAM alongside the already-resident Gemma text encoder. ComfyUI logged:

Model loaded partially; 19787.45 MB usable, 19774.67 MB loaded, 2193.30 MB offloaded

The base stage (20 steps) completed in 6:40 (~20 seconds per step). Refinement took 7:04 (~142 seconds per step—much slower than the distilled model’s 69s/step, because cfg=3.0 requires a double forward pass plus 2.19 GB of weights offloaded to system RAM).

Audio VAE loaded fine (693.46 MB). Then the process died.

The crucial discovery came from checking journalctl -k:

Out of memory: Killed process 8468 (python3) total-vm:150768544kB, anon-rss:3138148kB

The Linux kernel’s OOM killer terminated ComfyUI directly—not a CUDA out-of-memory error, but a system RAM crash triggered by kswapd0 during memory reclaim. The GPU itself still had roughly 4GB of VRAM free.

💡 Tip: Check the kernel log (journalctl -k), not just the ComfyUI console. OOM kills show up there first, which tells you whether it’s a GPU or CPU memory issue.

Attempt 2: Smaller Quant, Zero Offload

Model: ltx-2.3-22b-dev-Q6_K.gguf (17.77 GB on disk)
Settings: 15 base steps (matching Lightricks’ official scheduler), cfg=3.0, 3 refinement steps

I picked Q6_K to test whether CPU offload was the culprit. This time, the console reported:

Loaded completely; 19889.70 MB usable, 17218.07 MB loaded, full load: True

The entire model fit in VRAM with zero CPU offload. That immediately ruled out the offload hypothesis.

Base stage (15 steps): 5:18 (~21.2 seconds per step). Refinement (3 steps): 7:05. Identical crash at the identical point:

Requested to load VideoVAE
Model VideoVAE prepared for dynamic VRAM loading. 1384MB Staged
[Process killed by OOM killer - pid 14042]

This was the turning point: the crashes weren’t about VRAM—they were about system RAM, and they happened regardless of whether the model fit entirely in VRAM.

📌 Keep in mind: A model that fits completely in VRAM can still crash the system if the loading mechanism itself consumes unexpected amounts of CPU-side memory.

Attempt 3: VAE Tile Size Reduction

Model: Same Q6_K
Settings: Same as Attempt 2, but with VAEDecodeTiled parameters changed:

tile_size: 384 → 256
temporal_size: 4096 → 64

I reduced temporal_size because the original 4096 meant no actual temporal chunking for a 251-frame video (the chunk size was larger than the entire video). System RAM showed 27–29 GB available before the run started.

Result: identical crash at the identical point. The OOM killer was triggered this time by a ComfyUI worker thread’s allocation request (‘Thread-3 (promp invoked oom-killer’), not background memory reclaim—a different trigger path but the same underlying exhaustion, killing pid 15297.

Attempt 4: CFG Back to 1.0

Model: Same Q6_K
Settings: cfg=1.0 for both guiders (matching the distilled baseline exactly), 15 base steps, 3 refinement steps

Per-step timing confirmed the cfg=3.0 double pass was real: base stage dropped from ~21s/step to ~10.5s/step (157 seconds total for 15 steps). Refinement dropped to 3:31 (211 seconds, ~70.5s/step—essentially matching the distilled model’s 69s/step).

Despite cfg being identical to the successful distilled run, the result was identical: crash at the identical point, OOM killer killed pid 16517.

The Diagnostic Pattern: What Changed, What Didn’t

Variable	Attempt 1	Attempt 2	Attempt 3	Attempt 4
Model File	Q8_0	Q6_K	Q6_K	Q6_K
File Size (Disk)	22.75 GB	17.77 GB	17.77 GB	17.77 GB
CPU Offload	Yes (2.19GB)	No	No	No
VRAM Headroom at Crash	~4 GB	~4 GB	~4 GB	~4 GB
VAE Tile Size	384 / 4096	384 / 4096	256 / 64	384 / 4096
CFG Value	3.0	3.0	3.0	1.0
Crash Point	Video VAE	Video VAE	Video VAE	Video VAE
OOM Killer Triggered	Yes	Yes	Yes	Yes
Result	❌ CRASH	❌ CRASH	❌ CRASH	❌ CRASH

Every single attempt crashed at the exact same point: immediately after the audio VAE loaded successfully, at the moment the video VAE was requested for loading, before any decode progress was logged.

The only variable that stayed constant across all four failures—and that differed from the single successful run—was the loading mechanism itself: the dev model was loaded via UnetLoaderGGUF from a .gguf file, while the distilled model was loaded via DiffusionModelLoaderKJ from a native .safetensors file.

The Most Likely Explanation (With Caveats)

This is a strong correlation established through systematic elimination, not a proven root cause traced in source code.

GGUF files are quantized and require on-the-fly dequantization during inference. That dequantization process retains intermediate buffers in CPU memory. My hypothesis is that these buffers—accumulated across the base stage (15–20 steps) and refinement stage (3 steps)—aren’t released before the video VAE requests its 1,384 MB staging allocation. On a machine with only 3–4 GB of freely available system RAM after ComfyUI’s pinned memory reservation, this creates the perfect storm: the GPU has headroom, but the CPU-side memory pool is exhausted.

The distilled model, loaded as a native .safetensors file, doesn’t require dequantization and likely doesn’t accumulate the same persistent CPU-side buffers. It completed successfully under the same memory constraints.

What matters: this is an inference from behavior, not a confirmed code-level root cause. The GGUF loading path is the common factor, but I haven’t traced the source code to definitively prove where the retained memory lives or why it isn’t released.

⚠️ Important: GGUF dequantization overhead is the strongest remaining hypothesis, but it’s a behavioral correlation, not a code-traced root cause—the distinction matters if you’re trying to reproduce this or debug it yourself.

What This Means for Your Setup

If you have a 24GB GPU but only 32GB of system RAM, LTXV-2.3 dev vs distilled comparisons reveal a critical gap: the dev model loaded via GGUF will likely fail reproducibly at the video VAE loading stage. The GPU itself is not the constraint—every single attempt had comfortable VRAM headroom throughout (peak usage never exceeded about 20GB of the 24GB available).

The practical takeaway: check your system RAM, not just your VRAM, before attempting this specific dev-model pipeline. Be aware that the GGUF-loaded path in particular may be the source of instability rather than model size alone.

If you’re in this situation, here’s what actually works:

Use the distilled model (confirmed working on 32GB system RAM)
Increase system RAM to 48GB or higher (reduces memory pressure during GGUF dequantization)
Reduce the number of sampling steps further (below 15 base steps, untested but theoretically viable)
Load the dev model as a native .safetensors file instead of GGUF (requires ~68GB VRAM total—not feasible on 24GB GPU)

FAQ

Q: Is LTXV-2.3’s dev model’s problem VRAM or system RAM?
A: System RAM. Across 4 crash reproductions, VRAM usage never exceeded roughly 20GB of the available 24GB, but the Linux OOM killer terminated the ComfyUI process every time due to system RAM exhaustion on a 32GB machine, right when the video VAE tried to load.

Q: Does reducing CFG or VAE tile size fix the LTX-2.3 dev model crash?
A: Not in this testing. Both were tried as isolated fixes (cfg 3.0 → 1.0, VAE temporal_size 4096 → 64) and neither prevented the crash—it happened at the identical point in execution every time regardless.

Q: Does using a smaller GGUF quant (Q6_K vs Q8_0) fix the crash?
A: No. Q6_K (17.77GB) fit entirely in VRAM with zero CPU offload, unlike Q8_0 which needed to offload ~2.2GB—but both crashed identically, ruling out offload and quant size as the cause.

Q: Why did the distilled LTXV-2.3 model work fine but the dev model crash?
A: The clearest remaining difference is the loading path: the distilled model was loaded via KJNodes’ DiffusionModelLoaderKJ from a native .safetensors file, while the dev model was loaded via ComfyUI-GGUF’s UnetLoaderGGUF from a .gguf file. This is a strong correlation across 4 tests, not a confirmed root cause traced in source code.

Keep Reading

If GGUF quantization is new territory for you, our guide to GGUF models in ComfyUI covers what quantization actually does to model quality and memory. For the successful run this article follows up on, see our LTXV-2.3 + RTX Super Resolution walkthrough, which has the full working distilled-model pipeline. And if you’re weighing hardware for video generation workflows in general, our Best GPU for ComfyUI guide is a practical starting point.

🏆 Our recommendation

If you’re running a 24GB GPU with 32GB system RAM: stick with the distilled LTXV-2.3 model. It’s proven stable, generates quality output, and avoids the GGUF dequantization overhead entirely. The quality difference doesn’t justify the crash risk on this hardware.

If you have 48GB+ system RAM: the dev model via GGUF becomes viable. The extra system RAM headroom should accommodate the dequantization buffers without triggering the OOM killer. Test with Q6_K first (smaller file size, faster load), then experiment with Q8_0 if quality matters more than speed.

If you want the absolute best quality and have the VRAM: load the dev model as a native .safetensors file—but you’ll need a GPU with 40GB+ VRAM (H100, L40S, or dual-GPU setup). On a 24GB RTX 3090, this isn’t feasible.

Next steps in ComfyUI

Getting started

Troubleshooting

FAQ

Is LTXV-2.3's dev model's problem VRAM or system RAM?: System RAM, based on this testing. Across 4 crash reproductions, VRAM usage never exceeded roughly 20GB of the available 24GB, but the Linux OOM killer terminated the ComfyUI process every time due to system RAM exhaustion on a 32GB machine, right when the video VAE tried to load.
Does reducing CFG or VAE tile size fix the LTX-2.3 dev model OOM crash?: Not in this testing. Both were tried as isolated fixes (cfg 3.0 -> 1.0, VAE temporal_size 4096 -> 64) and neither prevented the crash -- it happened at the identical point in execution every time regardless.
Does using a smaller GGUF quant (Q6_K vs Q8_0) fix the crash?: No. Q6_K (17.77GB) fit entirely in VRAM with zero CPU offload, unlike Q8_0 which needed to offload ~2.2GB -- but both crashed identically, ruling out offload and quant size as the cause.
Why did the distilled LTXV-2.3 model work fine but the dev model crash?: The clearest remaining difference is the loading path: the distilled model was loaded via KJNodes' DiffusionModelLoaderKJ from a native .safetensors file, while the dev model was loaded via ComfyUI-GGUF's UnetLoaderGGUF from a .gguf file. This is a strong correlation across 4 tests, not a confirmed root cause traced in source code.

I Tried to Max Out an RTX 3090 with LTX-2.3's Full Dev Model — It Crashed 4 Times in a Row