Server Capacity Calculator
Before you provision infrastructure, it helps to know roughly how much traffic one server can carry and how many servers a peak load needs. This calculator applies Little's Law: throughput equals concurrent workers divided by average request latency. Enter the workers per server, the average latency in milliseconds, your target peak requests per second, and a headroom percentage to avoid running flat out. It returns the safe per-server throughput, the servers required (rounded up), and the resulting utilization. The math is exact queueing theory; validate against load testing for production.
Server capacity formula
Latency (s) = latency ms / 1,000
Max throughput = workers / latency (s)
Safe throughput = max * (1 - headroom% / 100)
Servers required = ceil(peak / safe throughput)
Utilization = peak / (servers * max) * 100
Little's Law gives the maximum sustainable throughput per server. The headroom margin derates it so bursts do not saturate the system. Servers are rounded up because you deploy whole machines.
Capacity planning notes
- Little's Law is an exact queueing-theory identity, not an empirical estimate.
- Latency variance, not just the average, drives tail-latency spikes near saturation.
- CPU, memory, and connection limits can cap throughput below the worker-based figure.
- A 30 percent headroom keeps utilization around 70 percent, a common reliability target.
- Always confirm with load testing against your real application and traffic shape.
Server capacity: frequently asked questions
How many requests per second can one server handle?
By Little's Law, the sustainable throughput equals the number of concurrent workers divided by the average request latency in seconds. A server with 50 workers handling 100 ms requests can serve about 50 / 0.1 = 500 requests per second at full utilization.
What is Little's Law?
Little's Law states that the average number of requests in a system equals the arrival rate times the average time each request spends in the system. Rearranged, throughput equals concurrency divided by latency. It is a fundamental, exact result of queueing theory.
Why include a headroom target?
Running a server at 100 percent utilization causes queueing and latency spikes. A headroom or safety margin (commonly 30 percent) leaves capacity for bursts. The calculator derates the theoretical maximum by your chosen utilization target before sizing.
How many servers do I need?
Divide your peak required requests per second by the safe per-server throughput, then round up. The calculator reports the exact ratio and rounds up to whole servers, since you cannot deploy a fraction of a machine.
Are these figures exact for my workload?
The arithmetic is exact, but real systems have variable latency, CPU limits, and memory constraints. Treat the result as a first-order capacity estimate, then validate with load testing against your actual application.
Official sources
- National Institute of Standards and Technology: performance measurement publications.
- U.S. National Science Foundation: queueing theory research.
Reviewed by the CalculatorHub team, edited by James Graham, 16 June 2026. See our methodology.