A/B Test Calculator
Calculate statistical significance for your A/B tests
Number of visitors in the control group
Number of conversions in the control group
Number of visitors in the variant group
Number of conversions in the variant group
Quick Examples:
Frequently Asked Questions
What confidence level should I use?
95% is the industry standard for most A/B tests. Use 99% for critical changes that could significantly impact revenue or user experience. Never go below 90% confidence.
How long should I run my test?
Run tests for at least 1-2 weeks to capture weekly patterns (weekday vs weekend behavior). For e-commerce, run through at least one full purchase cycle.
What if my test never reaches significance?
If after 4 weeks you haven't reached significance, the change likely has minimal impact. You can either accept the variant isn't better or test a more dramatic change.
What is Statistical Significance?
Statistical significance tells you whether the difference between your control and variant is likely due to a real effect or just random chance. A 95% confidence level (the industry standard) means there's only a 5% chance the results are due to random variation.
How Statistical Significance is Calculated
1. Calculate Conversion Rates
Control CR = Control Conversions / Control Visitors
Variant CR = Variant Conversions / Variant Visitors
2. Calculate Z-Score
The z-score measures how many standard deviations the variant is from the control
3. Determine Confidence Level
Z-score ≥ 1.96 = 95% confidence (statistically significant)
Z-score ≥ 2.576 = 99% confidence (highly significant)
Sample Size Guidelines
Minimum Sample
1,000+
Per variation for basic tests
Recommended Sample
5,000+
Per variation for reliable results
Small Changes
10,000+
Needed to detect small lifts (<5%)
High Confidence
20,000+
For critical business decisions
Common A/B Testing Mistakes
1. Stopping Tests Too Early
Wait for statistical significance AND run for at least 1-2 full business cycles
2. Testing Too Many Variations
More variations = more traffic needed. Stick to 2-3 variations max
3. Ignoring External Factors
Seasonality, marketing campaigns, and holidays can skew results
4. Testing Multiple Changes
Test one change at a time to know what caused the difference
5. Not Accounting for Novelty Effect
Initial lift may fade as users get used to the change. Run tests for 2+ weeks
When to Stop Your Test
- •Reached Significance: 95%+ confidence with sufficient sample size
- •Completed Full Cycle: At least 1-2 weeks to account for weekly patterns
- •No Movement: If after 4+ weeks there's no trend toward significance, stop
- •Negative Impact: If variant is significantly worse, stop immediately