3 advantages of sequential testing

Which statistical method is best for your experiments? The answer is non-committal: “It depends.” Different statistical methods work best in different scenarios, and each has trade-offs. However, much of the online discourse about the different methods is written for the scientific community, which fails to recognize the reality of working inside a business.

In a perfect world, teams could perform tests to the highest statistical rigor, running them for as long as necessary without external pressures. However, the reality is not all companies have data scientists on tap, and the popularity of fast, agile product development cycles means teams need answers fast.

Experimentation is also being democratized across all teams, but the guardrails for running those tests aren’t always in place. Not to mention perverse incentives that may encourage peeking and challenging trading environments, making any losses unacceptable.

In the above environment, fixed horizon tests can have some major drawbacks. Sequential testing is becoming increasingly popular as an alternative in these scenarios among large in-house testing teams such as Spotify, Netflix, and Booking.com. The reason for its popularity? Mike Fawcett explains;

If you want to conclude tests faster, with lower sample sizes, while peaking at results mid-test, AND have statistical confidence in your results, then you need sequential testing in your life. Traditional statistical methods like z-tests are great, but they come with big downsides for website managers. CRO professionals don’t work in labs. We work in complex, imperfect organizations where decision-making happens fast. Sequential tests are far better equipped to handle this organizational chaos.

Mike Fawcett

Founder at Mammoth Website Optimisation

The following article explores what leading experts think about sequential testing in experimentation and why you should consider using it in your experimentation program.

1 Interim analysis supported with sequential testing

Fixed-horizon tests require teams to set specific pre-test parameters, such as significance level, power level, and Minimum Detectable Effect (MDE). These figures should factor in the potential business impact the test could have, but this introduces subjectivity. Thus, setting these figures can result in teams waiting much longer than needed to reach the predetermined thresholds.

In reality, this means tests can run for weeks before any decisions can be made. Problems arise when teams “peek” at the results and decide to stop tests early based on what they see at that moment. Fixed-horizon tests aren’t designed to be valid until they are completed, so making decisions partway through increases the chance of a false positive.

But with sequential testing, this isn’t an issue. Always-valid p-values or group sequential tests support interim analysis so you can monitor live tests without increasing the error rate, but there are tradeoffs, as Ronny Kohavi explains;

In classical online controlled experiments, or A/B tests, an experiment is run to a predetermined fixed horizon (e.g., two weeks), and the p-value is computed once at the end.

Uncontrolled peeking during an A/B test is a serious problem, as it inflates type-I error, but there are well-understood approaches to support controlled peeking using sequential tests, group sequential methods, or always valid p-values.

These techniques result in loss of statistical power, and it is therefore recommended to use two one-sided tests: classical testing for the improvement side and controlled peeking for the negative side. This approach allows maintaining statistical power when experiments are positive, yet quickly aborting negative tests (e.g., due to bugs) to avoid harming users and the business.

Ronny Kohavi

Consultant and Instructor

2 Increase the speed of decision-making

A major issue for many testing teams is keeping up with the pace of the business. Many of the leading brands use agile project management with continuous code deployments. Releasing new features into the market can offer a significant first-mover advantage, so there is often enormous pressure on testing teams to make decisions faster. Prateek Parashar shares how the speed advantage of sequential testing can be helpful;

Sequential testing allows us to optimize experiments faster. Unlike fixed-horizon tests, it lets us analyze data as it rolls in. This agility is crucial in fast-paced environments like social media or streaming. If we see a clear winner early on, we can stop the experiment and deploy the better option sooner, maximizing user engagement and growth.

Prateek Parashar

Senior Data Scientist at Twitch

However, there are tradeoffs;

One of the biggest advantages of sequential testing is that it enables quicker decision-making than fixed-horizon testing, where all data must be collected before any analysis is performed. This can be advantageous in situations where timely decisions are crucial. However, it is also more complex to design and implement compared to fixed-horizon testing. Researchers often require careful consideration of stopping rules, sample sizes, and decision criteria, which can make the analysis more intricate.

Anjali Arora Mehra

Senior Director at DocuSign

3 Sequential testing increases ROI

Georgi Georgiev suggests that promoting the fact that sequential testing allows you to run more tests faster misses the main point. In his article, Georgi asserts that the biggest benefit of sequential testing is improving business returns;

The primary utility of sequential methods is in delivering an increased return on investment from testing. It comes through stopping A/B tests earlier than their fixed-sample counterparts, on average, realizing larger gains when the true effect is positive and incurring smaller losses from exposure of users to inferior test variants.

Georgi Georgiev

Founder at Analytics-Toolkit.com

Here’s an illustration of this effect in the real world, based on Georgi Georgiev's work;

The business impact of stopping a losing test early

A business is making $1M in revenue per week. Let’s assume a 50% split in traffic. If you stop a losing test 4 weeks earlier than its fixed-horizon counterpart, where the variant had a negative effect of -10%, then you are preventing a 5% loss of revenue, equating to $50,000. If the business is making $10M a week, then the avoided losses would be $500,000.

The business impact of putting a winning test live sooner

In the same business, if you were to stop a losing test 4 weeks earlier than its fixed-horizon counterpart, where the variant had a positive effect of +2%, you would make $40,000 in additional revenue compared to waiting for the fixed-horizon test to complete. If the business is making $10M a week, then the additional revenue would be $400,000.

The main advantage of sequential testing

Sequential testing offers some real-world advantages for experimentation teams, and these benefits are significant in a business context. They range from supporting peeking and allowing faster decision-making to generating higher ROI. However, there are tradeoffs.

Deciding which statistical method to use testing requires further investigation to determine the best option for your business. The tradeoffs vary depending on the framework chosen. For example, one of the cons of sequential testing is that it might require a larger sample size.

However, in a real-world scenario, you could pair a sequential test with a fixed sample methodology. This means sequential alerts will notify you when the sequential reliability reaches a given threshold, and you can validate peeking with the sequential counterpart. Incorporating this into your fixed sample methodology will unlock new scenarios and allow you to increase the velocity of experimentation.

Learn more about Sequential Testing and how it integrates with our other powerful statistical tools here.

Thanks to the experts who contributed to this article;