In the ever-evolving world of digital optimisation, AB testing remains a cornerstone of data-driven decision-making. Yet, despite its widespread adoption, many organisations still struggle to unlock its full potential. Poor planning, misunderstood metrics, and rushed conclusions often lead to wasted resources and missed opportunities.
If you’ve ever wondered how to ensure your AB tests deliver actionable insights instead of confusion, you’re in the right place. This guide dives into the 8 essential principles for designing AB tests that succeed—from crafting a hypothesis grounded in evidence to calculating the perfect sample size and avoiding common pitfalls like “peeking” at results too soon.

- Create a strong hypothesis
- Understand statistical significance
- Don’t mix statistical significance and statistical power
- Determine the correct sample size
- Randomise your audience
- Estimate Potential Impact and Minimal Detectable Effect (MDE)
- Calculate the Appropriate Run Time
- Don’t peek at the results
Whether you’re a seasoned analyst or just starting with experimentation, these tips will empower you to design smarter, more impactful tests that fuel growth and innovation. Let’s transform your AB testing from guesswork into a powerful engine for measurable success.
If you missed the first part of the series, or would like to refresh the fundamentals of AB testing, check it out in my previous post, AB Testing: the “Golden Egg” of Product Development.
1. Create a strong hypothesis
A hypothesis is a clear, testable statement that proposes a potential solution to a problem or an explanation for a specific behaviour. It acts as a bridge between an observed issue and the proposed action you believe will address it, supported by evidence or rationale.
In the context of A/B testing, a hypothesis guides your experiment by outlining:
| The problem | What issue or opportunity are you addressing? | Example: Low conversion rates on a checkout page. |
| The Solution | What change or action are you testing to solve it? | Example: Adding trust badges to the checkout page to build credibility. |
| Expected Outcome | What measurable result do you anticipate, and why? | Example: Increasing conversion rates by 15%. |
Hypothesis Formula
If [specific change] is implemented, then [desired outcome] will occur because [supporting evidence or rationale].
Example:
If trust badges are added to the checkout page, then conversion rates will increase by 15% because customer surveys indicate users hesitate due to security concerns.
Where Hypotheses Come From
Solid hypotheses are based on evidence. Key sources include:
- Analytics: Identify drop-off points (e.g., high exit rates on payment pages).
- Heatmaps: Pinpoint ignored elements or distractions on a page.
- User Testing: Observe usability issues firsthand.
- Surveys: Understand user pain points, like doubts about pricing or navigation.
Common Mistakes to Avoid
- Vague Hypotheses: Avoid unclear goals like “this might increase engagement.”
- Lack of Evidence: Don’t rely on assumptions without data to back them.
- Ignoring Big-Picture Learning: Focus on customer insights, even if the test doesn’t result in positive lifts.
By crafting hypotheses carefully, you ensure your A/B tests yield meaningful insights, driving better decisions and long-term success.
2. Understand statistical significance
Statistical significance ensures that the observed differences between control and variant groups are unlikely to have occurred by chance. It represents the confidence level that your test results are not random. The industry standard is 95% confidence, meaning there’s only a 5% chance the observed difference is due to randomness. For quick validations, 90% confidence may suffice.
Common Mistake to Avoid
- Inadequate Sample Size: Testing with too few users can lead to misleading conclusions. For example, testing a new landing page with only 50 users might yield a 30% conversion rate that cannot be replicated with a larger sample size. Always calculate the required sample size before running a test.
- Stopping Tests Early: Cutting a test short risks basing decisions on random fluctuations.
3. Don’t mix statistical significance and statistical power
While significance focuses on minimising false positives, statistical power ensures you detect true differences, avoiding false negatives (Type II errors). Statistical power, typically set at 80%, is the likelihood of detecting a real effect if one exists. Higher power reduces the risk of missing meaningful changes.
Key Factors Influencing Power
- Effect Size: Larger differences are easier to detect.
- Sample Size: Larger sample sizes improve power.
- Significance Level: Lower alpha levels (e.g., 0.01) reduce false positives but require higher power.
4. Determine the correct sample size
Sample size is one of the most critical factors in ensuring your AB test results are statistically valid and not the result of random chance. Larger sample sizes reduce the margin of error and increase the reliability of your conclusions.
Factors that affect sample size
- Baseline Conversion Rate: The current conversion rate of your control group. This serves as the reference point for comparison.
Example: If 10% of users convert on your control page, your baseline conversion rate is 10%. - Significance Level: Typically 95%, representing the confidence that observed differences are not due to chance.
- Statistical Power: Often set at 80%, ensuring you have an 80% chance of detecting a real difference.
- Minimal Detectable Effect (MDE): The smallest improvement you want to detect relative to the baseline conversion rate.
Example: If your baseline conversion rate is 10% and you want to detect at least a 2% lift, your MDE is 2%.
I recommend using online A/B testing calculators (eg. Adobe Target Size Calculator) as a simple way to determine your sample size and runtime.
Common Mistakes to Avoid
- Skipping Sample Size Calculation: Starting a test without knowing the required sample size can lead to unreliable or inconclusive results.
- Underestimating Traffic Needs: Not having enough traffic to reach the required sample size leads to tests that run too long or fail to achieve statistical significance.
- Stopping Tests Too Early: Ending a test before reaching the required sample size risks misleading conclusions due to random fluctuations.
Always calculate sample size before starting your test. This helps you plan for the expected runtime and ensures the test will gather enough data to deliver valid, actionable insights.
5. Randomise your audience
Randomising your audience is essential for ensuring that your A/B test results are unbiased and reliable. Proper randomisation distributes users equally across control and variant groups, minimising the risk of external factors influencing your results.
Why Randomisation Matters
Without randomisation, external factors such as geography, device type, gender, age could skew your results, making it impossible to attribute changes to the tested variation alone.
Use robust A/B testing tools that automatically handle randomisation. These tools ensure users are evenly distributed and external biases are minimised, giving you confidence in the validity of your results.
6. Estimate Potential Impact and Minimal Detectable Effect (MDE)
What Is Minimal Detectable Effect (MDE)?
MDE is the smallest change or improvement in your key metric (e.g., conversion rate) that you want to detect through the test. It represents the threshold where the difference becomes meaningful and actionable.
Why MDE and Impact Estimation Matter
- Realistic Expectations: Helps you set achievable goals based on past data and expected performance.
- Sample Size and Run time planning: By setting smaller MDE, you will to detect slighter conversion rate changes, which requires more traffic and possibly time. On the other hand, the larger MDE you set, the less traffic (and possibly time) is required to finish the test.
- Cost-Benefit Analysis: Allows you to evaluate if the potential impact justifies the time, effort, and resources of running the test.
MDE formula
MDE = (desired conversion rate lift / baseline conversion rate) x 100%
Common Mistakes to Avoid
- Setting Unrealistic MDEs: Expecting large changes (e.g., 50% lift) for small tweaks like a button color change.
- Ignoring Baseline Metrics: Testing without understanding current performance leads to misaligned goals.
- Overlooking ROI: Running tests with very small MDEs on low-traffic pages can be a poor use of resources.
7. Calculate the Appropriate Run Time
An A/B test must run long enough to gather sufficient data and achieve statistical significance. The duration depends on the required sample size and the daily or weekly traffic to the tested variation. Prematurely ending a test or allowing it to run indefinitely can lead to unreliable conclusions and wasted resources.
Best Practices for Run Time
- Account for Daily Variability: Run the test for at least 7 days, even on high-traffic sites, to account for differences in user behaviour across weekdays and weekends.
- Avoid Overextending: Tests that run too long can waste resources and be affected by external factors, such as seasonality or product changes.
8. Don’t peek at the results
Checking results frequently during a test is tempting but risky. Early trends are often misleading and can result in premature decisions that invalidate the test.
Why Peeking Is Problematic
- Early data may reflect random fluctuations, not meaningful differences.
- Prematurely stopping a test based on interim results leads to “p-hacking”—the false identification of trends due to randomness.
Example: The Risks of Peeking
You run a test comparing two landing pages:
- After 5 days, Variant B shows a 20% higher conversion rate.
- However, by the test’s end, the control outperforms Variant B due to long-term user behavior patterns.
Stopping the test early based on the initial spike would lead to implementing the wrong page and missing the actual trend.
A/B testing is one of the most powerful tools for making informed, data-driven decisions—but its success depends on proper planning, execution, and interpretation of results. By focusing on strong hypotheses, understanding statistical principles like significance and power, and accurately calculating sample size and runtime, you can ensure your tests provide reliable, actionable insights.
It’s also important to avoid common mistakes like stopping tests too early, underestimating traffic needs, or peeking at results. These missteps can undermine the reliability of your conclusions and lead to wasted resources.
But testing is only part of the equation. Once your test concludes, analysing the results correctly is crucial to uncover actionable insights and apply them effectively. Stay tuned for my next article, where I’ll cover how to analyse A/B test results, interpret key metrics, and make data-driven decisions that elevate your business.
Let’s schedule a 45-minute introductory call where I can get to know your business, challenges, and objectives—without any commitment from your side.
During this call, we will discuss your current data landscape, key pain points, and how can I help you to achieve your goals. After our call, I will develop and share a detailed project pl

