Skip to main content
Pearson+ LogoPearson+ Logo
All Calculators & ConvertersAll calculators

Enter values

Tip: Use the original experience as baseline (often A).

Variant A

Conversions must be ≤ visitors.

Variant B

Commas and spaces are OK.

Add up to Variant H. Each additional variant is compared against the baseline.

If p ≤ α, we call the comparison significant (under this model).

Use two-sided unless your decision rule was set in advance.

Bonferroni uses α′ = α / (# comparisons) to reduce false positives when testing many variants.

Options:

If unchecked, “Winner” marks the strongest significant difference (two-sided), which can label the baseline as winner when another variant is significantly worse.

Chips prefill common A/B testing scenarios and run the significance check.

Result:

No results yet. Enter values and click Calculate.

How to use this calculator

  • Enter visitors + conversions for each variant.
  • Choose a baseline, your α and hypothesis.
  • Click Calculate to see results for every variant vs the baseline.

What this calculator computes

  • Conversion rate: p = x / n for each variant
  • Absolute lift (Δ): variant − baseline (in percentage points)
  • Relative lift: Δ / baseline
  • z-test + p-value: per variant vs baseline comparison
  • Confidence interval: for Δ (variant − baseline)

Note: This is a fast, standard approximation. If your experiment is small or highly skewed, consider confirming with an exact test / more data.

Statistical vs Practical Significance

A result can be statistically significant but still not be practically meaningful.

  • Statistical significance means the difference is unlikely due to chance at your chosen α.
  • Practical significance asks whether the lift is large enough to matter for your product or business goals.
  • Always review effect size, confidence intervals, and guardrail metrics — not just p-values — before shipping.

Tip: A tiny lift can be “significant” with huge samples, while a meaningful lift may be inconclusive with small samples.

Formula & Equation Used

Rates: p = x / n

Pooled rate: p̂ = (x0 + x1) / (n0 + n1)

SE (pooled): SE = √(p̂(1−p̂)(1/n0 + 1/n1))

z-stat: z = (p1 − p0) / SE

CI for Δ: Δ ± z* · √(p0(1−p0)/n0 + p1(1−p1)/n1)

Example Problems & Step-by-Step Solutions

Example 1 — Classic A/B (two-sided)

Variant A (baseline): 10,000 visitors, 400 conversions. Variant B: 10,000 visitors, 520 conversions. Use α = 0.05, two-sided.

  1. Rates: pA = 400/10000 = 0.0400, pB = 520/10000 = 0.0520
  2. Absolute lift: Δ = pB − pA = 0.0120 (+1.20 pp)
  3. Pooled rate: p̂ = (400+520)/(10000+10000) = 0.0460
  4. Compute z and p-value (two-proportion z-test). If p ≤ 0.05, call it significant.
  5. Interpretation: Look at p-value + the CI for Δ. If the CI stays above 0, B likely beats A.

Example 2 — One-sided “B > A” decision rule

Variant A (baseline): 5,000 visitors, 150 conversions. Variant B: 5,000 visitors, 210 conversions. Use α = 0.05, one-sided (B > A).

  1. Rates: pA = 150/5000 = 0.0300, pB = 210/5000 = 0.0420
  2. Absolute lift: Δ = 0.0420 − 0.0300 = 0.0120 (+1.20 pp)
  3. Relative lift: Δ / pA = 0.0120 / 0.0300 = 0.40 (+40%)
  4. Because the hypothesis is directional, the calculator uses a one-sided p-value.
  5. Interpretation: If p ≤ 0.05, you have evidence that B improves over A under your pre-set rule.

Example 3 — A/B/C with Bonferroni (multiple comparisons)

Baseline A: 12,000 visitors, 540 conversions. Variant B: 12,000 visitors, 660 conversions. Variant C: 12,000 visitors, 690 conversions. Use α = 0.05, two-sided, and Bonferroni.

  1. Rates: pA = 540/12000 = 0.0450, pB = 0.0550, pC = 0.0575
  2. Compute each comparison vs baseline: B vs A and C vs A
  3. Bonferroni adjusts alpha for the number of comparisons: α′ = 0.05 / 2 = 0.025
  4. Decision rule becomes: significant only if p ≤ 0.025 (more conservative)
  5. Interpretation: This reduces false positives when testing multiple variants, but can make it harder to “declare a winner.”

Frequently Asked Questions

Q: What does “statistically significant” mean?

It means the observed difference is unlikely under the “no difference” assumption at your chosen α.

Q: A/B/n — is it OK to compare many variants?

You can, but multiple comparisons increase false positives. Consider Bonferroni for a conservative check, and treat this tool as a fast readout.

Q: Two-sided or one-sided?

Use two-sided unless your decision rule was set in advance (e.g., only ship if the variant beats baseline).

Q: Does statistical significance guarantee the variant is better?

No. Significance indicates confidence in a difference, not whether the impact is large enough to matter. Consider lift, confidence intervals, and guardrail metrics.