A/B Test Significance Calculator
Compare conversion rates across variants. Pick a baseline (usually Control A), and we’ll compute p-value, confidence, lift, and a significance call for each variant vs the baseline using a two-proportion z-test. Supports A/B/n (multiple variants) — each variant is compared to your chosen baseline.
Background
Each comparison is a standard two-proportion z-test (baseline vs one variant). For true A/B/n experiments, be careful with repeated comparisons (multiple testing) and stopping rules. This tool is ideal for fast checks and clear reporting.
How to use this calculator
- Enter visitors and conversions for each variant.
- Pick a baseline (usually the control) and select confidence + hypothesis.
- Click Calculate to see results for every variant vs the baseline.
What this calculator computes
- Rates: p = conversions / visitors for each variant
- Lift vs baseline: absolute and relative lift for each variant
- z-test: pooled standard error, z-stat, and p-value per comparison
- Confidence interval: for the difference (variant − baseline) using an unpooled SE
Statistical vs Practical Significance
A result can be statistically significant but still not be practically meaningful.
- Statistical significance means the difference is unlikely due to chance at your chosen confidence level.
- Practical significance asks whether the lift is large enough to matter for your product or business goals.
- Always review effect size, confidence intervals, and guardrail metrics — not just the p-value — before shipping.
Tip: A tiny lift can be “significant” with huge samples, while a meaningful lift may be inconclusive with small samples.
Formula & Equation Used
Rates: p = x / n
Pooled rate: p̂ = (x0 + x1) / (n0 + n1)
SE (pooled): SE = √(p̂(1−p̂)(1/n0 + 1/n1))
z-stat: z = (p1 − p0) / SE
CI for Δ: Δ ± z* · √(p0(1−p0)/n0 + p1(1−p1)/n1)
Example Problems & Step-by-Step Solutions
Example 1 — Classic A/B test (two-sided, 95%)
Variant A is the baseline. You observe 420 conversions out of 10,000 visitors for A and 510 out of 10,000 for B. Is B statistically different from A?
- Compute rates: pA = 420 / 10000 = 0.042, pB = 510 / 10000 = 0.051.
- Absolute lift: Δ = pB − pA = 0.009 (+0.90 pp).
- Pooled rate: p̂ = (420 + 510) / (10000 + 10000) = 0.0465.
- Pooled SE: SE = √(p̂(1−p̂)(1/nA + 1/nB)) ≈ 0.00298.
- z-statistic: z = Δ / SE ≈ 3.02.
- Answer: p-value < 0.05 ⇒ statistically significant. Variant B outperforms A.
Example 2 — One-sided “B > A” decision rule
You pre-decided to ship only if B beats A. A has 1,500 / 25,000 conversions, B has 1,605 / 25,000. Is B significantly better at 95% confidence?
- Rates: pA = 0.060, pB = 0.0642.
- Lift: Δ = 0.0042 (+0.42 pp).
- Pooled rate: p̂ = (1500 + 1605) / 50000 = 0.0621.
- Pooled SE: SE ≈ 0.00216.
- z-statistic: z ≈ 1.94.
- Answer: One-sided p-value < 0.05 ⇒ B is significantly better than A.
Example 3 — Baseline wins (variant is worse)
Baseline A has 1,000 / 20,000 conversions. Variant B has 860 / 20,000. Did the variant hurt performance?
- Rates: pA = 0.050, pB = 0.043.
- Lift: Δ = −0.007 (−0.70 pp).
- Pooled rate: p̂ = (1000 + 860) / 40000 = 0.0465.
- Pooled SE: SE ≈ 0.00211.
- z-statistic: z ≈ −3.32.
- Answer: p-value < 0.05 and Δ < 0 ⇒ variant B is significantly worse. Baseline A is the winner.
Frequently Asked Questions
Q: What does “statistically significant” mean?
It means the observed difference is unlikely under the “no difference” assumption at your chosen α.
Q: A/B/n — is it OK to compare many variants?
You can, but multiple comparisons increase false positives. Treat this as a fast check, and consider a multiple-testing correction for final decisions.
Q: Two-sided or one-sided?
Use two-sided unless your decision rule was set in advance (e.g., only ship if the variant beats the baseline).
Q: Does statistical significance guarantee the variant is better?
No. Significance indicates confidence in a difference, not whether the impact is large enough to matter. Consider lift, confidence intervals, and guardrails.