NEW: Heap for mobile. Track every interaction, on every platform.

Learn more
skip to content
Loading...
    • The Digital Insights Platform Transform your digital experience
    • How Heap Works A video guide
    • How Heap Compares Heap vs. competitors
    • The Future of Insights A comic book guide
  • Data Insights

    • Session Replay Complete context with a single click
    • Illuminate Data science that pinpoints unknown friction
    • Journeys Visual maps of all user flows

    Data Analysis

    • Segments User cohorts for actionable insights
    • Dashboards Share insights on critical metrics
    • Charts Analyze everything about your users
    • Playbooks Plug-and-play templates and analyses

    Data Foundation

    • Capture Automatic event tracking and apis
    • Mobile Track and analyze your users across devices
    • Enrichment Add context to your data
    • Integrations Connect bi-directionally to other tools

    Data Management

    • Governance Keep data clean and trusted
    • Security & Privacy Security and compliance made simple
    • Infrastructure How we build for scale
    • Heap Connect Send Heap data directly to your warehouse
  • Solutions

    • Funnel Optimization Improve conversion in user flows
    • Product Adoption Maximize adoption across your site
    • User Behavior Understand what your users do
    • Product Led Growth Manage PLG with data

    Industries

    • SaaS Easily improve acquisition, retention, and expansion
    • eCommerce Increase purchases and order value
    • Financial Services Raise share of wallet and LTV

    Heap For Teams

    • Product Teams Optimize product activation, conversion and retention
    • Marketing Teams Optimize acquisition performance and costs
    • Data Teams Optimize behavioral data without code
  • Pricing
  • Support

    • Heap University Video Tutorials
    • Help Center How to use Heap
    • Heap Plays Tactical how-to guides
    • Heap Updates
    • Professional Services

    Resources

    • Blog A community for digital builders
    • Content Library Ebooks, whitepapers, videos, guides
    • Press News from and about Heap
    • Webinars & Events Virtual and live events
    • Careers Join us

    Ecosystem

    • Customer Community Join the conversation
    • Partners Technology and Solutions Partners
    • Developers
    • Customers Over 8,000 successful companies
  • Free TrialRequest Demo
  • Log In
  • Free Trial
  • Request Demo
  • Log In

All Blogs

Data Stories

Don't Stop Your A/B Tests Partway Through

Ravi Parikh
April 7, 20145 min read
  • Facebook
  • Twitter
  • LinkedIn
Heap

What do pharmaceutical trials have in common with designing websites and mobile apps? A lot, as it turns out. When we want to test out a new drug, we conduct an experiment to see whether a test group that’s been given the drug outperforms a control group.

And when we want to test changes to our website at scale, we conduct A/B tests to see if our changes end up improving conversion rates.

In a drug trial, the worst-case scenario is a false positive: our experiment makes it seem like the drug is effective, but in reality, it’s no better than (or even worse than) a placebo. This means an ineffective or even harmful drug gets released to the public.

When you conduct A/B tests on your website, there are similar concerns. If you’re not disciplined about how you run and evaluate A/B test experiments, you’ll get false positives that do nothing to improve your website.

In fact, it’s even possible that improper A/B testing suggests changes that decrease your conversion rates.

How does this happen? By stopping your tests too early. In drug trials, false positives happen when we’re not disciplined about stopping a trial. There are strict “stopping rules” to prevent false positives. The same statistical logic applies when running A/B tests, but we often ignore these rules (or don’t even know about them altogether!).

When running an A/B test through a service like Optimizely, it’s easy to check the results while the test is still running. Instead of letting the test run all the way through, many people (especially startups!) save time and money by stopping a test as soon as it has reached statistical significance. Doing this will cause the rate of false positives to skyrocket.

I’ll illustrate how this can happen with an example. At Heap, we ran a simple A/B test a few months ago between two different headlines on our homepage: “No code required” and “Capture everything.” We decided to run an A/B test for 1000 users each. So far, so good. But a few hours after we deployed the test, we saw that the “No code required” variation was winning by a statistically significant margin. We selected that as the new headline without letting the experiment run all the way through.

A/B Test Example

This is where we went wrong. As it turned out, we left the A/B test running on a portion of our userbase by accident. When looking at the results 4 days later after a few thousand visitors, it turned out that the two headlines had no difference in conversion rate. If we had let the experiment run all the way through, the early randomness would have evened out and neither variation would have won. By checking in on the experiment before it finished, we had a false positive result.

In many experiments, we set the significance threshold to be 5% (or a p-value threshold of 0.05). This means that we’ll accept that Variation A is better than Variation B if A beats B by a margin large enough that a false positive would only happen 5% of the time. Phrased another way, this means that Variation A needs to do a lot better than Variation B to be considered the “winner” of the A/B test; if it’s only a little bit better, then it might just be random chance. This helps us be confident that our experiments are improving conversion rates, and not just making random, useless (or even detrimental) changes to our product. But if we stop the experiment before it’s over, then this has the effect of relaxing the 5% constraint, sometimes by a huge amount. The more often we check the experiment (with the intent of stopping it if it shows significance), the more we undermine the power of A/B testing.

I ran some simulated A/B tests to see what would happen if we check our experiments while they’re still running. The simulation was as follows:

  • We have two variations of our product, A and B, and we want to see which converts better.

  • I set up the simulation so that the conversion rate for both variations was exactly 10%. So if an A/B test experiment reported that one variation converted better than the other, then that would be a false positive.

  • I ran both variations against 1000 simulated visitors each, measured the final conversion rate for each variation, and calculated the p-value based on the difference in conversion rates. I set the p-value threshold to 0.05, so that we expect a false positive rate of 5%. Sure enough, when I ran several A/B test simulations, about 5% of them resulted in a false positive.

  • Then I simulated what would happen if we checked for statistical significance midway through, after just 500 visitors had seen each variation (as well as once more at the end). Now what percentage led to false positives? This time, I saw a false positive rate of around 8.4% (out of 100,000 simulations, 8,426 of them were false positives). Even just one check mid-way through increased our false positive rate significantly (from 5% to 8.4%).

  • Now I decided to see what would happen if we checked even more often. What if I had checked my Optimizely dashboard every 100 visitors (10 total checks throughout the 1000 visitors), and stopped the A/B test if I saw statistical significance at any one of those checks? What about every 50 visitors? What about every visitor, i.e. we stop the test as soon as we hit statistical significance at all? Here are the results:

Number of checks Simulated False Positive Rate

1, at the end (like we’re supposed to) 5.0%

2 (every 500 visitors) 8.4%

5 (every 200 visitors) 14.3%

10 (every 100 visitors) 19.5%

20 (every 50 visitors) 25.5%

100 (every 10 visitors) 40.1%

1000 (every visitor) 63.5%

This means that if we’re monitoring our A/B test and stopping it as soon as we hit significance, the false positive rate will be over 60%. That’s worse than useless! In cases like this, even a worse variation has a decent chance of winning.

The fix to this problem is simple: don’t stop your A/B tests part-way through! Let them run their course, and then determine whether the results are significant.

It is possible to design an A/B test experiment such that it’s okay to stop it before completion, or even just let it run indefinitely until it hits significance. However, the statistics involved are a lot more complicated than the two-tailed test we use in traditional A/B testing.

Ravi Parikh

Was this helpful?
PreviousNext

Related Stories

See All

  • Google Analytics 4

    Product Insights

    Google Analytics 4: What it promises, and what that really means

    April 28, 2022

  • Heap.io

    How to

    The 3 key first steps to improving CRO

    March 29, 2023

  • Heap.io

    Data Stories

    Celebrating H&R Block as the inaugural winner of the Digital Innovator Award

    March 22, 2023

Subscribe

Sign up to stay on top of the latest posts.

Better insights. Faster.

Request Demo
  • Platform
  • Capture
  • Enrichment
  • Integrations
  • Governance
  • Security & Privacy
  • Infrastructure
  • Illuminate
  • Segments
  • Charts
  • Dashboards
  • Playbooks
  • Use Cases
  • Funnel Optimization
  • Product Adoption
  • User Behavior
  • Product Led Growth
  • Customer 360
  • SaaS
  • eCommerce
  • Financial Services
  • Why Heap
  • The Digital Insights Platform
  • How Heap Works
  • How Heap Compares
  • The Future of Insights
  • Resources
  • Blog
  • Content Library
  • Events
  • Topics
  • Heap University
  • Community
  • Professional Services
  • Company
  • About
  • Partners
  • Press
  • Careers
  • Customers
  • Support
  • Request Demo
  • Help Center
  • Contact Us
  • Pricing
  • Social
  • Twitter
  • Facebook
  • LinkedIn
  • YouTube

© 2023 Heap Inc. All Rights Reserved.

  • Legal
  • Privacy Policy
  • Status
  • Trust