Question 1

What is statistical significance in A/B testing?

Accepted Answer

Statistical significance in A/B testing means the observed difference in conversion rates between control and variant is unlikely to be due to chance alone. When a test reaches statistical significance at 95% confidence, it means there is only a 5% probability (alpha=0.05) that you would observe this large a difference if there were no true underlying effect. The result is measured by the p-value: p less than 0.05 indicates significance at the 95% level.

Question 2

How do you calculate A/B test statistical significance?

Accepted Answer

A/B test significance uses a two-proportion z-test. Steps: 1. Calculate conversion rates for control (p1=x1/n1) and variant (p2=x2/n2). 2. Calculate the pooled proportion pp=(x1+x2)/(n1+n2). 3. Compute the standard error SE=sqrt(pp times (1-pp) times (1/n1 + 1/n2)). 4. Compute z-score = (p2-p1)/SE. 5. Convert z to p-value using the standard normal distribution. If p-value < 0.05, the result is statistically significant at 95% confidence.

Question 3

What sample size do I need for an A/B test?

Accepted Answer

Sample size per variant = 2 times (z_alpha/2 + z_beta) squared times p_bar times (1-p_bar) divided by (MDE squared), where MDE is the minimum detectable effect (smallest relative lift you care to detect), z_alpha/2 is 1.96 for 95% confidence, and z_beta is 0.842 for 80% power. For a baseline conversion of 5% and MDE of 20% relative lift (to 6%), you need about 16,000 visitors per variant. Smaller MDE or higher confidence/power requires more visitors.

Question 4

What is one-tailed vs two-tailed A/B testing?

Accepted Answer

A one-tailed test asks only whether the variant is BETTER than control (directional test). It requires a z-score of 1.645 for 95% significance and is more statistically powerful for detecting improvements. A two-tailed test asks whether the variant is DIFFERENT from control in either direction. It requires z greater than 1.96 for 95% significance and is preferred when you want to detect both improvements and regressions. Most CRO practitioners recommend two-tailed tests to detect unexpected regressions.

Question 5

What is the peeking problem in A/B testing?

Accepted Answer

The peeking problem occurs when you check test results multiple times and stop the test as soon as you see significance. Each peek inflates the false positive rate above your nominal alpha. If you check a test 5 times and stop when p<0.05, your true false positive rate can be 14% instead of 5%. Solutions: pre-commit to a fixed sample size and check only once at the end, use sequential testing methods that control for repeated peeking, or apply a Bonferroni correction for multiple checks.

Question 6

What is the minimum detectable effect (MDE) in A/B testing?

Accepted Answer

The minimum detectable effect (MDE) is the smallest true difference in conversion rates you want your test to reliably detect with your chosen statistical power. For example, an MDE of 10% relative lift on a 5% baseline means you want to detect a change from 5% to 5.5%. A smaller MDE requires a larger sample size. Choose MDE based on the smallest business-meaningful improvement: if a 2% relative lift is not worth acting on, do not set MDE to 2%.

Question 7

What is statistical power in A/B testing?

Accepted Answer

Statistical power is the probability of correctly detecting a true effect when one exists (1 minus Type II error rate, or 1 minus beta). Standard power is 80%, meaning there is a 20% chance of missing a real effect. Higher power (90%, 95%) requires larger samples but reduces the risk of missing real improvements. Power is set during the test design phase when calculating sample size, not after the test is run.

Question 8

How long should I run an A/B test?

Accepted Answer

Run time depends on your daily traffic and required sample size. If you need 10,000 visitors per variant and get 500 daily visitors split 50/50 between control and variant, you need 10,000 divided by 250 per day = 40 days. Minimum recommended runtime is 1-2 complete business cycles (7-14 days) to account for day-of-week effects. Do not stop early even if the test looks significant; peeking inflates false positive rates.

Question 9

What is a good conversion rate uplift to test for?

Accepted Answer

A good MDE (minimum detectable effect) is the smallest relative lift that would be worthwhile to implement. For email open rates or large-revenue pages, even 2-5% relative improvement is worth detecting, but requires large samples. For lower-stakes decisions or when traffic is limited, set MDE to 10-20% relative lift to keep test durations feasible. Do not set MDE smaller than a lift that would actually change your business decision.

Question 10

What is relative uplift vs absolute uplift in A/B testing?

Accepted Answer

Absolute uplift = variant conversion rate minus control conversion rate. Example: control 5.0%, variant 5.5%, absolute uplift = 0.5 percentage points. Relative uplift = (variant minus control) divided by control times 100. Same example: (5.5-5.0)/5.0 times 100 = 10% relative uplift. Relative uplift is more meaningful for comparing across different baseline conversion rates. A 0.5% absolute uplift means much more on a 1% baseline (50% relative) than on a 50% baseline (1% relative).

Question 11

Can I test more than two variants in an A/B test?

Accepted Answer

Yes, but you need to apply a correction for multiple comparisons. Testing three or more variants (A/B/C or multivariate) without correction inflates the false positive rate. The Bonferroni correction divides the significance threshold by the number of comparisons: for 3 variants vs control (2 tests), use alpha=0.025 per test instead of 0.05. This means you need a higher z-score for significance. Alternatively, use statistical methods designed for multiple testing like Holm-Bonferroni or family-wise error rate control.

Question 12

What is the difference between a frequentist and Bayesian A/B test?

Accepted Answer

Frequentist A/B testing (covered by this calculator) uses p-values and confidence intervals based on null hypothesis significance testing. It requires pre-specifying sample size and alpha level. Bayesian A/B testing uses probability distributions for conversion rates and calculates the probability that the variant is better than control. Bayesian tests allow more flexible stopping rules and natural uncertainty quantification but require setting prior beliefs. Both approaches are valid; frequentist methods are more standard in industry.

Baseline Rate	MDE (Relative)	Absolute Change	n per variant (95% CI, 80% power)	Days (500/day)
5%	10%	5% → 5.5%	30,753	61.5 days
5%	20%	5% → 6%	8,165	16.3 days
5%	50%	5% → 7.5%	1,518	3.0 days
10%	10%	10% → 11%	28,817	57.6 days
10%	20%	10% → 12%	7,534	15.1 days
20%	10%	20% → 22%	24,252	48.5 days

A/B Test Calculator

Sources & Methodology

A/B Testing Statistics — Complete Guide to Significance, Sample Size & MDE

How A/B Test Statistical Significance Works

One-Tailed vs Two-Tailed A/B Tests

Sample Size Planning — The Most Important Step

Minimum Detectable Effect (MDE) — Setting Realistic Test Goals

The Peeking Problem — Why You Cannot Check Early

Relative Uplift vs Absolute Uplift

Multiple Variants and Bonferroni Correction

Missing a Statistics Calculator?