When you hit the torch.cuda.OutOfMemoryError: CUDA out of memory message mid-generation, it feels like a dead end—but it’s not. Your GPU has the compute power to run ComfyUI, but the VRAM math doesn’t work out. Whether you’re running a 4GB budget card or a 12GB powerhouse, CUDA out of memory in ComfyUI has a practical fix—often several—that will get you generating again without upgrading hardware. For the full picture on every VRAM-saving technique available, see our complete guide to reducing VRAM usage in ComfyUI.
The error happens because your GPU’s VRAM must simultaneously hold the diffusion model (checkpoint/UNET), the text encoder (CLIP or T5XXL), the VAE decoder, the latent buffer during sampling, and the final image framebuffer. A full SDXL model at 6-7GB plus VAE at 800MB leaves an 8GB GPU extremely tight. Full Flux Dev (12-16GB unquantized) is outright impossible on most consumer cards without optimization. The ComfyUI OOM error can strike at three different moments—model loading, KSampler inference, or VAE decoding—and each requires a slightly different approach.
Quick Reference: Solutions at a Glance
| Solution | VRAM Saved | Speed Impact | Best For |
|---|---|---|---|
| Restart ComfyUI | 2-4GB | None | Fragmentation issues |
--lowvram flag | 3-5GB | -20-40% | 4-6GB GPUs |
--medvram flag | 1-3GB | -5-15% | 6-8GB GPUs |
| Reduce resolution | 1-4GB | None | All GPUs |
| Tiled VAE | 1-2GB | None above 768px | High-res generation |
| GGUF quantization | 4-8GB | -10-20% | Permanent solution |
Understanding Where the Memory Error Occurs
Before jumping to fixes, identify when the ComfyUI low VRAM error happens. This tells you which component is running out of space.
Model loading errors appear immediately when you load a checkpoint or GGUF file. The model weights themselves don’t fit in VRAM.
KSampler errors occur during the sampling/generation loop. Your resolution, batch size, or sampling steps exceed available memory during inference.
VAE decode errors happen at the final step, even if sampling completed successfully. The VAE alone can’t fit the full-resolution tensor in memory.
💡 Tip: Pinpointing whether your error happens at model load, during sampling, or in VAE decoding tells you which fix to try first—and saves time troubleshooting.
Solution 1: Restart ComfyUI Completely
Before trying anything else, fully restart the application.
Don’t just reload the browser page. Close the ComfyUI process entirely—use Task Manager (Windows), Activity Monitor (macOS), or pkill -f "main.py" on Linux. Then restart with python main.py.
Long sessions fragment VRAM. The GPU’s memory manager doesn’t automatically free unused allocations; they pile up as fragmented blocks. A clean restart recovers several gigabytes and is often enough on its own for OOM errors that appear suddenly in workflows that used to work fine.
💡 Tip: VRAM fragmentation is real—a full restart often fixes OOM errors that suddenly appear in workflows that used to work, with zero configuration needed.
Solution 2: Use Memory Optimization Flags at Startup
ComfyUI includes built-in flags that trade speed for VRAM efficiency. Run these at startup:
--lowvram (4-6GB GPUs)
Offloads model parts to system RAM when VRAM fills. Slower—expect 20-40% longer generation times—but enables models that wouldn’t otherwise fit. Essential for 4GB and 6GB cards.
--medvram (6-8GB GPUs)
Balances speed and memory. Offloads less aggressively than --lowvram but more than default. Recommended for 6GB and 8GB cards as a starting point.
--novram (extreme cases, 4GB minimum)
Moves nearly everything to system RAM. Very slow but works on minimal GPUs. Use only if --lowvram fails.
--cpu (no dedicated GPU)
Runs entirely on CPU. Extremely slow (10-100x slower than GPU). Use only for testing workflows without a dedicated graphics card.
Never combine --lowvram and --medvram in the same command. Pick one.
Example startup commands:
python main.py --lowvram
python main.py --medvram
python main.py --novram
If you’re unsure which to use, start with --medvram for 6-8GB cards or --lowvram for 4-6GB cards.
💡 Tip: The
--lowvramflag makes ComfyUI offload parts of the model to system RAM when VRAM fills up—it’s slower but lets you generate images that would otherwise cause OOM on any NVIDIA GPU.
Solution 3: Reduce Generation Resolution
VRAM consumption scales quadratically with resolution. Dropping from 1024px to 768px can halve memory use.
In the Empty Latent Image node, adjust width and height downward:
- SD 1.5: Works 512-1024px on 4-8GB GPUs (use multiples of 64)
- SDXL: Needs
--lowvramat 768px on 6GB, or--medvramat 1024px on 8GB (impossible on 4GB without flags) - Flux GGUF Q4: Works 512px with
--lowvramon 4GB, up to 1024px on 8GB - Flux Dev (full): Impossible on any of these without quantization
Recommended resolution steps for multiples of 64 (SD/SDXL) or 16 (Flux):
| Resolution | VRAM Load (relative) | Best GPU |
|---|---|---|
| 512×512 | 1× | 4GB+ |
| 640×640 | 1.5× | 4GB+ (Flux GGUF) |
| 768×768 | 2.25× | 6GB+ |
| 896×896 | 3.1× | 8GB+ |
| 1024×1024 | 4× | 8GB+ |
| 1280×1280 | 6.25× | 12GB+ |
Start 128-192px lower than you think you need. You can always increase it after confirming stability.
Solution 4: Enable Tiled VAE Decoding
VAE decoding consumes a fixed block of VRAM regardless of sampling and can be the sole cause of OOM at high resolutions, even if KSampler worked fine.
Install the tiled VAE node:
- Open ComfyUI Manager (hamburger menu, top right)
- Search for VAEDecodeTiled
- Install it
- In your workflow, replace VAEDecode with VAEDecodeTiled
- Connect the latent output from KSampler to VAEDecodeTiled (same inputs as before)
At 768px and below, there’s no speed difference. Above 1024px, tiled VAE can be the difference between OOM and success. This is particularly effective for Flux models, where VAE decoding at high resolutions is memory-intensive.
💡 Tip: The VAE runs at the end of the process and can fail even if the KSampler went fine—enable VAE tiling in the VAEDecode node or use a quantized VAE to avoid it.
Solution 5: Switch to Quantized GGUF Models
Quantization is the definitive fix for low-VRAM GPUs. GGUF models compress weights from full precision to lower bit-depths with minimal quality loss. If none of this is enough and you’re still hitting walls on your current card, check our best GPU for ComfyUI guide for what to upgrade to.
For Flux Dev, here’s the memory breakdown:
| Format | VRAM (loaded) | System RAM | Quality |
|---|---|---|---|
| Full bf16 | 24GB | 24GB+ | 100% (reference) |
| fp8 | 12GB | 12GB | ~99% |
| GGUF Q8_0 | 8-10GB | 8-10GB | ~98% |
| GGUF Q4_K_M | 6-8GB | 6-8GB | ~95% |
| GGUF Q3_K_M | 5-6GB | 5-6GB | ~90% |
To use GGUF models:
- Install ComfyUI-GGUF via ComfyUI Manager (search “GGUF”)
- Download a quantized GGUF model (HuggingFace hosts many)
- Place it in
ComfyUI/models/unet/ - Replace CheckpointLoaderSimple with UnetLoaderGGUF in your workflow
- Select the GGUF file from the dropdown
Text encoders can also be quantized. For example, T5XXL drops from 9GB (fp16) to 4.5GB (fp8). This compounds the memory savings.
GGUF Q4_K_M is the sweet spot for most users: 95% quality with 30-40% smaller memory footprint than full precision.
Solution 6: Update NVIDIA Drivers (2025-2026 Models)
If OOM appears specifically with Flux.2, LTX 2.3, or other late-2025/2026 models, it may be a driver issue, not VRAM.
These newer architectures require NVIDIA Studio Driver 595+ (starting April 2026). Older drivers crash with new architectures even if VRAM is sufficient.
Fix:
- Download the Studio Driver (not Game Ready) from nvidia.com
- Uninstall your current NVIDIA driver
- Install the Studio Driver
- Reinstall PyTorch for your CUDA version:
(Replacepip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124cu124with your CUDA version:cu121for 12.1,cu123for 12.3, etc.)
Restart ComfyUI. The error should resolve.
Recommended Configurations by GPU
4GB GPUs (GTX 1650, RTX 3050 4GB)
python main.py --lowvram
- Models: SD 1.5 or Flux GGUF Q3_K_M
- Max resolution: 512×512 (SD 1.5) or 640×640 (Flux GGUF)
- Always enable: VAE tiling
6GB GPUs (RTX 3060 6GB, RTX 4060 8GB restricted)
python main.py --medvram
- Models: SDXL with
--lowvramor Flux GGUF Q4_K_M - Max resolution: 768×768
- VAE tiling: Recommended above 768px
8GB GPUs (RTX 3070, RTX 4060 Ti)
python main.py --medvram
- Models: SDXL or Flux GGUF Q8_0
- Max resolution: 1024×1024
- VAE tiling: Only needed for Flux at 1024px+
12GB GPUs (RTX 3060 12GB, RTX 4070)
python main.py
(No flags needed for most models)
- Models: Flux Dev fp8 runs comfortably
- Max resolution: 1024×1024 without issues, 1280×1280 with VAE tiling
FAQ
Q: Why does CUDA out of memory happen in ComfyUI? A: Because the model, resolution or batch size exceed the VRAM available on your GPU. It’s very common on 4-8GB GPUs when trying to run SDXL or Flux without optimizations.
Q: Does —lowvram work with any NVIDIA GPU? A: Yes. The —lowvram flag makes ComfyUI offload parts of the model to system RAM when VRAM fills up. It’s slower but lets you generate images that would otherwise cause OOM.
Q: Do quantized GGUF models reduce quality? A: Barely. The Q8_0 and fp8 formats are practically indistinguishable from the original. Q4_K_M reduces quality a bit more but fits on 4-6GB GPUs where the full model simply won’t run at all.
Q: Can the OOM error happen only in the VAE even if the model loads fine? A: Yes. The VAE runs at the end of the process and can fail even if the KSampler went fine. Enable VAE tiling in the VAEDecode node or use a quantized VAE to avoid it.
Q: Does restarting ComfyUI free up VRAM? A: Yes. VRAM fragmentation is a real problem in long sessions. Restarting the process cleans memory completely and often fixes OOM errors that suddenly appear in workflows that used to work.
Keep Reading
For a deeper dive into quantization as a VRAM-saving technique, see our GGUF models guide — it walks through running Flux on 8GB cards. If your own hardware still isn’t enough, renting a cloud GPU on RunPod or Vast.ai is a realistic option before buying new hardware.
🏆 Our Recommendation
If you’re on a 4-6GB GPU with SDXL or Flux: Start with --lowvram, reduce resolution to 640-768px, and enable VAE tiling. This combination solves 90% of OOM errors without sacrificing too much speed.
If you’re on an 8GB GPU: Use --medvram as your baseline. Switch to GGUF quantization (Q4_K_M) only if you need to run Flux Dev full-precision or want to push past 1024px resolution.
If you want a permanent, long-term solution: Invest time in GGUF quantization. Q4_K_M models are 95% quality with 30-40% smaller memory footprint—they solve the root problem rather than working around it with flags.
If OOM suddenly appears in workflows that used to work: Always restart ComfyUI first. VRAM fragmentation fixes itself with a clean process restart, and you’ll save hours of troubleshooting.
Next steps in ComfyUI
Getting started
FAQ
- Why does CUDA out of memory happen in ComfyUI?
- Because the model, resolution or batch size exceed the VRAM available on your GPU. It's very common on 4-8GB GPUs when trying to run SDXL or Flux without optimizations.
- Does --lowvram work with any NVIDIA GPU?
- Yes. The --lowvram flag makes ComfyUI offload parts of the model to system RAM when VRAM fills up. It's slower but lets you generate images that would otherwise cause OOM.
- Do quantized GGUF models reduce quality?
- Barely. The Q8_0 and fp8 formats are practically indistinguishable from the original. Q4_K_M reduces quality a bit more but fits on 4-6GB GPUs where the full model simply won't run at all.
- Can the OOM error happen only in the VAE even if the model loads fine?
- Yes. The VAE runs at the end of the process and can fail even if the KSampler went fine. Enable VAE tiling in the VAEDecode node or use a quantized VAE to avoid it.
- Does restarting ComfyUI free up VRAM?
- Yes. VRAM fragmentation is a real problem in long sessions. Restarting the process cleans memory completely and often fixes OOM errors that suddenly appear in workflows that used to work.