CUDA Out of Memory in ComfyUI: How to Fix It (2026)

When you hit the torch.cuda.OutOfMemoryError: CUDA out of memory message mid-generation, it feels like a dead end—but it’s not. Your GPU has the compute power to run ComfyUI, but the VRAM math doesn’t work out. Whether you’re running a 4GB budget card or a 12GB powerhouse, CUDA out of memory in ComfyUI has a practical fix—often several—that will get you generating again without upgrading hardware. For the full picture on every VRAM-saving technique available, see our complete guide to reducing VRAM usage in ComfyUI.

The error happens because your GPU’s VRAM must simultaneously hold the diffusion model (checkpoint/UNET), the text encoder (CLIP or T5XXL), the VAE decoder, the latent buffer during sampling, and the final image framebuffer. A full SDXL model at 6-7GB plus VAE at 800MB leaves an 8GB GPU extremely tight. Full Flux Dev (12-16GB unquantized) is outright impossible on most consumer cards without optimization. The ComfyUI OOM error can strike at three different moments—model loading, KSampler inference, or VAE decoding—and each requires a slightly different approach.

Quick Reference: Solutions at a Glance

Solution	VRAM Saved	Speed Impact	Best For
Restart ComfyUI	2-4GB	None	Fragmentation issues
`--lowvram` flag	3-5GB	-20-40%	4-6GB GPUs
`--medvram` flag	1-3GB	-5-15%	6-8GB GPUs
Reduce resolution	1-4GB	None	All GPUs
Tiled VAE	1-2GB	None above 768px	High-res generation
GGUF quantization	4-8GB	-10-20%	Permanent solution

Understanding Where the Memory Error Occurs

Before jumping to fixes, identify when the ComfyUI low VRAM error happens. This tells you which component is running out of space.

Model loading errors appear immediately when you load a checkpoint or GGUF file. The model weights themselves don’t fit in VRAM.

KSampler errors occur during the sampling/generation loop. Your resolution, batch size, or sampling steps exceed available memory during inference.

VAE decode errors happen at the final step, even if sampling completed successfully. The VAE alone can’t fit the full-resolution tensor in memory.

💡 Tip: Pinpointing whether your error happens at model load, during sampling, or in VAE decoding tells you which fix to try first—and saves time troubleshooting.

Solution 1: Restart ComfyUI Completely

Before trying anything else, fully restart the application.

Don’t just reload the browser page. Close the ComfyUI process entirely—use Task Manager (Windows), Activity Monitor (macOS), or pkill -f "main.py" on Linux. Then restart with python main.py.

Long sessions fragment VRAM. The GPU’s memory manager doesn’t automatically free unused allocations; they pile up as fragmented blocks. A clean restart recovers several gigabytes and is often enough on its own for OOM errors that appear suddenly in workflows that used to work fine.

💡 Tip: VRAM fragmentation is real—a full restart often fixes OOM errors that suddenly appear in workflows that used to work, with zero configuration needed.

Solution 2: Use Memory Optimization Flags at Startup

ComfyUI includes built-in flags that trade speed for VRAM efficiency. Run these at startup:

--lowvram (4-6GB GPUs) Offloads model parts to system RAM when VRAM fills. Slower—expect 20-40% longer generation times—but enables models that wouldn’t otherwise fit. Essential for 4GB and 6GB cards.

--medvram (6-8GB GPUs) Balances speed and memory. Offloads less aggressively than --lowvram but more than default. Recommended for 6GB and 8GB cards as a starting point.

--novram (extreme cases, 4GB minimum) Moves nearly everything to system RAM. Very slow but works on minimal GPUs. Use only if --lowvram fails.

--cpu (no dedicated GPU) Runs entirely on CPU. Extremely slow (10-100x slower than GPU). Use only for testing workflows without a dedicated graphics card.

Never combine --lowvram and --medvram in the same command. Pick one.

Example startup commands:

python main.py --lowvram
python main.py --medvram
python main.py --novram

If you’re unsure which to use, start with --medvram for 6-8GB cards or --lowvram for 4-6GB cards.

💡 Tip: The --lowvram flag makes ComfyUI offload parts of the model to system RAM when VRAM fills up—it’s slower but lets you generate images that would otherwise cause OOM on any NVIDIA GPU.

Solution 3: Reduce Generation Resolution

VRAM consumption scales quadratically with resolution. Dropping from 1024px to 768px can halve memory use.

In the Empty Latent Image node, adjust width and height downward:

SD 1.5: Works 512-1024px on 4-8GB GPUs (use multiples of 64)
SDXL: Needs --lowvram at 768px on 6GB, or --medvram at 1024px on 8GB (impossible on 4GB without flags)
Flux GGUF Q4: Works 512px with --lowvram on 4GB, up to 1024px on 8GB
Flux Dev (full): Impossible on any of these without quantization

Recommended resolution steps for multiples of 64 (SD/SDXL) or 16 (Flux):

Resolution	VRAM Load (relative)	Best GPU
512×512	1×	4GB+
640×640	1.5×	4GB+ (Flux GGUF)
768×768	2.25×	6GB+
896×896	3.1×	8GB+
1024×1024	4×	8GB+
1280×1280	6.25×	12GB+

Start 128-192px lower than you think you need. You can always increase it after confirming stability.

Solution 4: Enable Tiled VAE Decoding

VAE decoding consumes a fixed block of VRAM regardless of sampling and can be the sole cause of OOM at high resolutions, even if KSampler worked fine.

Install the tiled VAE node:

Open ComfyUI Manager (hamburger menu, top right)
Search for VAEDecodeTiled
Install it
In your workflow, replace VAEDecode with VAEDecodeTiled
Connect the latent output from KSampler to VAEDecodeTiled (same inputs as before)

At 768px and below, there’s no speed difference. Above 1024px, tiled VAE can be the difference between OOM and success. This is particularly effective for Flux models, where VAE decoding at high resolutions is memory-intensive.

💡 Tip: The VAE runs at the end of the process and can fail even if the KSampler went fine—enable VAE tiling in the VAEDecode node or use a quantized VAE to avoid it.

Solution 5: Switch to Quantized GGUF Models

Quantization is the definitive fix for low-VRAM GPUs. GGUF models compress weights from full precision to lower bit-depths with minimal quality loss. If none of this is enough and you’re still hitting walls on your current card, check our best GPU for ComfyUI guide for what to upgrade to.

For Flux Dev, here’s the memory breakdown:

Format	VRAM (loaded)	System RAM	Quality
Full bf16	24GB	24GB+	100% (reference)
fp8	12GB	12GB	~99%
GGUF Q8_0	8-10GB	8-10GB	~98%
GGUF Q4_K_M	6-8GB	6-8GB	~95%
GGUF Q3_K_M	5-6GB	5-6GB	~90%

To use GGUF models:

Install ComfyUI-GGUF via ComfyUI Manager (search “GGUF”)
Download a quantized GGUF model (HuggingFace hosts many)
Place it in ComfyUI/models/unet/
Replace CheckpointLoaderSimple with UnetLoaderGGUF in your workflow
Select the GGUF file from the dropdown

Text encoders can also be quantized. For example, T5XXL drops from 9GB (fp16) to 4.5GB (fp8). This compounds the memory savings.

GGUF Q4_K_M is the sweet spot for most users: 95% quality with 30-40% smaller memory footprint than full precision.

Solution 6: Update NVIDIA Drivers (2025-2026 Models)

If OOM appears specifically with Flux.2, LTX 2.3, or other late-2025/2026 models, it may be a driver issue, not VRAM.

These newer architectures require NVIDIA Studio Driver 595+ (starting April 2026). Older drivers crash with new architectures even if VRAM is sufficient.

Fix:

Download the Studio Driver (not Game Ready) from nvidia.com
Uninstall your current NVIDIA driver
Install the Studio Driver
Reinstall PyTorch for your CUDA version:
```
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
```
(Replace cu124 with your CUDA version: cu121 for 12.1, cu123 for 12.3, etc.)

Restart ComfyUI. The error should resolve.

Recommended Configurations by GPU

4GB GPUs (GTX 1650, RTX 3050 4GB)

python main.py --lowvram

Models: SD 1.5 or Flux GGUF Q3_K_M
Max resolution: 512×512 (SD 1.5) or 640×640 (Flux GGUF)
Always enable: VAE tiling

6GB GPUs (RTX 3060 6GB, RTX 4060 8GB restricted)

python main.py --medvram

Models: SDXL with --lowvram or Flux GGUF Q4_K_M
Max resolution: 768×768
VAE tiling: Recommended above 768px

8GB GPUs (RTX 3070, RTX 4060 Ti)

python main.py --medvram

Models: SDXL or Flux GGUF Q8_0
Max resolution: 1024×1024
VAE tiling: Only needed for Flux at 1024px+

12GB GPUs (RTX 3060 12GB, RTX 4070)

python main.py

(No flags needed for most models)

Models: Flux Dev fp8 runs comfortably
Max resolution: 1024×1024 without issues, 1280×1280 with VAE tiling

FAQ

Q: Why does CUDA out of memory happen in ComfyUI? A: Because the model, resolution or batch size exceed the VRAM available on your GPU. It’s very common on 4-8GB GPUs when trying to run SDXL or Flux without optimizations.

Q: Does —lowvram work with any NVIDIA GPU? A: Yes. The —lowvram flag makes ComfyUI offload parts of the model to system RAM when VRAM fills up. It’s slower but lets you generate images that would otherwise cause OOM.

Q: Do quantized GGUF models reduce quality? A: Barely. The Q8_0 and fp8 formats are practically indistinguishable from the original. Q4_K_M reduces quality a bit more but fits on 4-6GB GPUs where the full model simply won’t run at all.

Q: Can the OOM error happen only in the VAE even if the model loads fine? A: Yes. The VAE runs at the end of the process and can fail even if the KSampler went fine. Enable VAE tiling in the VAEDecode node or use a quantized VAE to avoid it.

Q: Does restarting ComfyUI free up VRAM? A: Yes. VRAM fragmentation is a real problem in long sessions. Restarting the process cleans memory completely and often fixes OOM errors that suddenly appear in workflows that used to work.

Keep Reading

For a deeper dive into quantization as a VRAM-saving technique, see our GGUF models guide — it walks through running Flux on 8GB cards. If your own hardware still isn’t enough, renting a cloud GPU on RunPod or Vast.ai is a realistic option before buying new hardware.

🏆 Our Recommendation

If you’re on a 4-6GB GPU with SDXL or Flux: Start with --lowvram, reduce resolution to 640-768px, and enable VAE tiling. This combination solves 90% of OOM errors without sacrificing too much speed.

If you’re on an 8GB GPU: Use --medvram as your baseline. Switch to GGUF quantization (Q4_K_M) only if you need to run Flux Dev full-precision or want to push past 1024px resolution.

If you want a permanent, long-term solution: Invest time in GGUF quantization. Q4_K_M models are 95% quality with 30-40% smaller memory footprint—they solve the root problem rather than working around it with flags.

If OOM suddenly appears in workflows that used to work: Always restart ComfyUI first. VRAM fragmentation fixes itself with a clean process restart, and you’ll save hours of troubleshooting.

Next steps in ComfyUI

Getting started

Troubleshooting

FAQ

Why does CUDA out of memory happen in ComfyUI?: Because the model, resolution or batch size exceed the VRAM available on your GPU. It's very common on 4-8GB GPUs when trying to run SDXL or Flux without optimizations.
Does --lowvram work with any NVIDIA GPU?: Yes. The --lowvram flag makes ComfyUI offload parts of the model to system RAM when VRAM fills up. It's slower but lets you generate images that would otherwise cause OOM.
Do quantized GGUF models reduce quality?: Barely. The Q8_0 and fp8 formats are practically indistinguishable from the original. Q4_K_M reduces quality a bit more but fits on 4-6GB GPUs where the full model simply won't run at all.
Can the OOM error happen only in the VAE even if the model loads fine?: Yes. The VAE runs at the end of the process and can fail even if the KSampler went fine. Enable VAE tiling in the VAEDecode node or use a quantized VAE to avoid it.
Does restarting ComfyUI free up VRAM?: Yes. VRAM fragmentation is a real problem in long sessions. Restarting the process cleans memory completely and often fixes OOM errors that suddenly appear in workflows that used to work.