A/B Test Significance Calculator
A/B testing is the gold standard for making data-driven decisions in product, marketing, and UX. But a difference in conversion rates between two variants is only meaningful if it is statistically significant, meaning the difference is unlikely to have occurred by chance. This calculator uses the two-proportion z-test to determine whether the difference in conversion rates between your control and variant is statistically significant. Enter the number of visitors and conversions for each group, and the calculator returns the z-score, approximate p-value, and confidence level so you can decide with confidence whether to ship the winning variant.
A/B test significance formula (two-proportion z-test)
p1 = Control conversions / Control visitors
p2 = Variant conversions / Variant visitors
p_pool = (Conv_A + Conv_B) / (Visitors_A + Visitors_B)
SE = sqrt(p_pool * (1 - p_pool) * (1/n1 + 1/n2))
z = (p2 - p1) / SE
Confidence = normal CDF of abs(z) * 2 - 1
The two-proportion z-test compares the conversion rates of two groups and tests whether the difference is statistically significant.
Interpreting results
- Confidence below 90%: not significant - do not make a decision yet.
- 90 to 94%: marginal significance - consider running longer.
- 95 to 98%: significant at the standard threshold.
- 99%+: highly significant - strong evidence to ship the winner.
A/B testing: frequently asked questions
What is statistical significance in A/B testing?
Statistical significance tells you how likely it is that the observed difference between two variants occurred by chance. A 95% confidence level means there is only a 5% probability the result is due to random variation.
What confidence level should I use?
95% (p-value below 0.05) is the standard for most A/B tests. Use 99% (p-value below 0.01) for high-stakes decisions with large revenue impact. 90% may be acceptable for low-risk tests with high traffic.
What is a p-value?
The p-value is the probability of observing the measured difference (or larger) if there were actually no true difference between variants. A p-value of 0.05 means a 5% probability the result is a fluke.
How long should I run an A/B test?
Run tests for at least one full business cycle (usually 1 to 2 weeks) and until you reach the minimum sample size required for your target statistical power. Stopping early inflates false positive rates.
What is statistical power?
Power is the probability of detecting a true effect when one exists. The standard target is 80% power. Higher power requires larger sample sizes. Use the Minimum Detectable Effect calculator to plan required sample size.
Sources
- National Institute of Standards and Technology (NIST): Two-Sample z-Test for Proportions.
- U.S. Census Bureau - Statistical Methods: Statistical Research.
Reviewed by the CalculatorHub team, edited by James Graham, 14 June 2026. See our methodology.