GPU Memory for Model Size Calculator

Reviewed by the CalculatorHub team, edited by James Graham. Memory = parameters * bytes per parameter * overhead. Last verified 19 June 2026.

Whether a neural network fits on a given GPU comes down to memory: the weights take parameter count times the bytes per parameter, and running the model needs some extra on top. Enter the parameter count in billions, choose the numeric precision, and set an overhead factor for activations and buffers. This calculator returns the weight memory and the total estimated memory in gigabytes and gibibytes, so you can check it against a card's VRAM before you deploy.

Parameters (billions)

Precision (bytes per parameter)

Overhead factor (e.g. 1.2)

Weights memory (GB)

0.00

Total estimated memory (GB)

0.00

Total (GiB, binary)

0.00

Bytes per parameter used

0.00

GPU memory formula

weight bytes = parameters * bytes per parameter
weights GB = weight bytes / 1,000,000,000
total GB = weights GB * overhead factor
total GiB = total bytes / 1,073,741,824

Parameters are entered in billions, so a 7 means 7,000,000,000 parameters. GB uses the decimal 10 to the 9th; GiB uses the binary 2 to the 30th, which is how some tools report VRAM.

Weight memory by precision (7B model)

FP32 (4 bytes): about 28 GB.
FP16 / BF16 (2 bytes): about 14 GB.
INT8 (1 byte): about 7 GB.
INT4 (0.5 bytes): about 3.5 GB.
Add the overhead factor on top for activations and buffers at run time.

GPU memory: frequently asked questions

How much GPU memory does a model need for its weights?

The weights alone take parameter count times bytes per parameter. A 7 billion parameter model at 16-bit precision (2 bytes each) needs about 14 gigabytes just for weights. Running it also needs extra memory for activations, key-value cache, and framework buffers, which the overhead factor accounts for.

How many bytes per parameter does each precision use?

32-bit floating point uses 4 bytes, 16-bit (FP16 or BF16) uses 2 bytes, 8-bit integer uses 1 byte, and 4-bit quantization uses about 0.5 bytes per parameter. Lower precision shrinks the memory footprint at some cost to accuracy.

What overhead factor should I use?

For inference, a factor of about 1.2 (20 percent over the weight size) is a common starting estimate for activations and runtime buffers. Training needs far more, often 3 to 4 times the weight memory, because it also stores gradients and optimizer states. The factor is user-editable so you can match your workload.

Why is the estimate approximate?

Actual usage depends on batch size, sequence length, the key-value cache, the framework, and memory fragmentation. This calculator gives a planning estimate for whether a model fits a given card; always test with headroom before committing.

Sources and definitions

Weight memory is parameter count times bytes per parameter, where the bytes follow the IEEE 754 and integer storage sizes for each numeric format. These are standard definitions.
National Institute of Standards and Technology: SI prefixes and binary prefixes (giga, gibi).

Reviewed by the CalculatorHub team, edited by James Graham, 19 June 2026. See our methodology.