GPU VRAM Requirement Calculator
The video memory a machine-learning model needs starts with a simple product: the number of parameters times the bytes used to store each one. A 7 billion parameter model in 16-bit precision needs about 14 GB just for weights. Real workloads add memory for activations and caches, captured here by an overhead multiplier you control. Enter the parameter count in billions, the bytes per parameter for your precision, and an overhead factor. This calculator returns the weight memory, the total estimated VRAM, and the headroom against a card size you specify. Every assumption is an editable input.
VRAM requirement formula
Weight bytes = parameters(billions) * 1e9 * bytes per parameter
Weight memory (GB) = weight bytes / 1e9 (decimal GB)
Total VRAM (GB) = weight memory * overhead multiplier
Headroom = your GPU VRAM - total VRAM
Fits = total VRAM is less than or equal to your GPU VRAM
Bytes per parameter: FP32 = 4, FP16 or BF16 = 2, INT8 = 1, 4-bit = 0.5. The overhead multiplier covers activations and caches.
VRAM context
- Weight memory scales directly with parameter count and bytes per parameter.
- 16-bit (FP16 or BF16) is the common inference precision at 2 bytes per parameter.
- Quantizing to 8-bit or 4-bit roughly halves or quarters weight memory.
- Training needs far more memory than inference for gradients and optimizer state; set the overhead higher.
- All inputs are user-editable so the estimate reflects your exact model and setup.
GPU VRAM calculator: frequently asked questions
How much VRAM does a model need to load?
The weights alone need parameters times bytes per parameter. A 7 billion parameter model in 16-bit precision (2 bytes each) needs about 14 GB just for weights. Inference also needs memory for activations and the key-value cache, and training needs several times more for gradients and optimizer state.
How many bytes does each numeric precision use?
FP32 uses 4 bytes per parameter, FP16 and BF16 use 2 bytes, 8-bit quantization (INT8) uses 1 byte, and 4-bit quantization uses 0.5 bytes. Lower precision reduces memory at some cost to accuracy. Enter the bytes per parameter that match your chosen format.
Why add an overhead factor?
Loading weights is not the only memory cost. Activations, attention key-value caches, temporary buffers, and framework allocator fragmentation all consume VRAM during inference. A multiplier of roughly 1.2 covers light inference overhead; training overhead is far higher and depends on batch size and optimizer.
Why is the parameter count and overhead user-editable?
Memory use depends on your specific model, batch size, sequence length, and framework, none of which we can know for you. This tool exposes every input so you supply real values rather than relying on a hidden assumption. It estimates VRAM from the numbers you provide.
Does quantization always fit a model in less memory?
Quantization lowers the bytes per parameter, so a 7 billion parameter model at 4-bit needs about 3.5 GB for weights versus 14 GB at 16-bit. Whether it fits also depends on activation and cache overhead, which this calculator includes through the overhead multiplier.
Official sources
- IEEE: IEEE 754 floating-point arithmetic standard (byte widths).
- NIST: National Institute of Standards and Technology, units of digital information.
Reviewed by the CalculatorHub team, edited by James Graham, 17 June 2026. See our methodology.