A/B Test Significance Calculator
Compare conversion rates across variants. Pick a baseline (usually Control A), and we’ll compute p-value, confidence interval, lift, and a significance call for each variant vs the baseline using a two-proportion z-test. Supports A/B/n (multiple variants) — each variant is compared to your chosen baseline.
Background
Each comparison is a standard two-proportion z-test (baseline vs one variant). For true A/B/n experiments, repeated comparisons can inflate false positives (multiple testing), and “peeking” early can also bias results. This tool is ideal for fast checks and clear reporting.
How to use this calculator
- Enter visitors + conversions for each variant.
- Choose a baseline, your α and hypothesis.
- Click Calculate to see results for every variant vs the baseline.
What this calculator computes
- Conversion rate: p = x / n for each variant
- Absolute lift (Δ): variant − baseline (in percentage points)
- Relative lift: Δ / baseline
- z-test + p-value: per variant vs baseline comparison
- Confidence interval: for Δ (variant − baseline)
Note: This is a fast, standard approximation. If your experiment is small or highly skewed, consider confirming with an exact test / more data.
Statistical vs Practical Significance
A result can be statistically significant but still not be practically meaningful.
- Statistical significance means the difference is unlikely due to chance at your chosen α.
- Practical significance asks whether the lift is large enough to matter for your product or business goals.
- Always review effect size, confidence intervals, and guardrail metrics — not just p-values — before shipping.
Tip: A tiny lift can be “significant” with huge samples, while a meaningful lift may be inconclusive with small samples.
Formula & Equation Used
Rates: p = x / n
Pooled rate: p̂ = (x0 + x1) / (n0 + n1)
SE (pooled): SE = √(p̂(1−p̂)(1/n0 + 1/n1))
z-stat: z = (p1 − p0) / SE
CI for Δ: Δ ± z* · √(p0(1−p0)/n0 + p1(1−p1)/n1)
Example Problems & Step-by-Step Solutions
Example 1 — Classic A/B (two-sided)
Variant A (baseline): 10,000 visitors, 400 conversions. Variant B: 10,000 visitors, 520 conversions. Use α = 0.05, two-sided.
- Rates: pA = 400/10000 = 0.0400, pB = 520/10000 = 0.0520
- Absolute lift: Δ = pB − pA = 0.0120 (+1.20 pp)
- Pooled rate: p̂ = (400+520)/(10000+10000) = 0.0460
- Compute z and p-value (two-proportion z-test). If p ≤ 0.05, call it significant.
- Interpretation: Look at p-value + the CI for Δ. If the CI stays above 0, B likely beats A.
Example 2 — One-sided “B > A” decision rule
Variant A (baseline): 5,000 visitors, 150 conversions. Variant B: 5,000 visitors, 210 conversions. Use α = 0.05, one-sided (B > A).
- Rates: pA = 150/5000 = 0.0300, pB = 210/5000 = 0.0420
- Absolute lift: Δ = 0.0420 − 0.0300 = 0.0120 (+1.20 pp)
- Relative lift: Δ / pA = 0.0120 / 0.0300 = 0.40 (+40%)
- Because the hypothesis is directional, the calculator uses a one-sided p-value.
- Interpretation: If p ≤ 0.05, you have evidence that B improves over A under your pre-set rule.
Example 3 — A/B/C with Bonferroni (multiple comparisons)
Baseline A: 12,000 visitors, 540 conversions. Variant B: 12,000 visitors, 660 conversions. Variant C: 12,000 visitors, 690 conversions. Use α = 0.05, two-sided, and Bonferroni.
- Rates: pA = 540/12000 = 0.0450, pB = 0.0550, pC = 0.0575
- Compute each comparison vs baseline: B vs A and C vs A
- Bonferroni adjusts alpha for the number of comparisons: α′ = 0.05 / 2 = 0.025
- Decision rule becomes: significant only if p ≤ 0.025 (more conservative)
- Interpretation: This reduces false positives when testing multiple variants, but can make it harder to “declare a winner.”
Frequently Asked Questions
Q: What does “statistically significant” mean?
It means the observed difference is unlikely under the “no difference” assumption at your chosen α.
Q: A/B/n — is it OK to compare many variants?
You can, but multiple comparisons increase false positives. Consider Bonferroni for a conservative check, and treat this tool as a fast readout.
Q: Two-sided or one-sided?
Use two-sided unless your decision rule was set in advance (e.g., only ship if the variant beats baseline).
Q: Does statistical significance guarantee the variant is better?
No. Significance indicates confidence in a difference, not whether the impact is large enough to matter. Consider lift, confidence intervals, and guardrail metrics.