Steve Hanov recently suggested a thought-provoking idea: applying multi-armed bandit algorithms to A/B testing, and Visual Website Optimizer (VWO) did an interesting and accessible-to-non-statisticians statistical analysis and explanation of Steve’s method, to evaluate the accuracy of his claims. While I loved their analysis, VWO’s conclusion is a howler!
I’m jumping ahead of myself though because I haven’t explained what multi-armed bandit testing is. In short, you briefly test all variations equally and then regularly show the best-converting alternative, even though you haven’t yet gotten statistically significant data.
Anyways, VWO ran the math and concluded:
“[Multi-Armed bandit CRO algos] are very useful in the scenario where you are doing continuous and permanent optimization and are frequently adding/removing versions to be tested. It is also useful in cases where you can afford to wait for a longer duration before knowing with certainty which version is performing the best (or the worst). (Or, in cases where you don’t care about the knowledge and the certainty, and you simply want average conversion rates to be optimized.)”
There are three statements there, which can be boiled down to two:
1. That these algorithms are ultimately better than regular split testing for someone running a single test on the same thing over time, and they just want to keep adding in/removing variations, provided you want high conversion rates and don’t care about CRO knowledge gained/certainty of statistical significance.
2. They work for people who can wait longer before knowing with certainty what performs best.
Multi-Armed bandit algorithms are AWEFUL in both of the above cases, and it’s surprising VWO didn’t realize it based on their own data!
The critical thinking secret to how you disprove Steve’s argument is to question Steve’s assumptions. Steve assumes a scenario where you’re only ever going to run a single split test, then go home.
In CRO, testing is not set-and-forget. Conversion Rate Optimization is about Continual Revenue [and profit] Optimization.
Who’s got time to set-and-forget and wait super long to get results? That delays me from starting my next test and beating my winner from this test. Whoever said you run a split test once then go home?
Well, Steve said you run a split test once and go home –
“During drug trials, you can only give half the patients the life saving treatment. The others get sugar water. If the treatment works, group B lost out. This sacrifice is made to get good data. But it doesn’t have to be this way.”
The problem drug testers face is that they’re going to run a single study, spend a load of time writing it up for a journal and then spend more time begging for more research grants. It’s a sucky mode of operation (MO), but that’s the constraint on their work (until someone use their critical thinking to help them continually optimize, too).
But that’s NOT the MO of CRO. CRO’s MO is Continual Revenue Optimization.
Run test A.
Take your winning page, and run test B to try and beat it.
Take the new winner and run test C to beat that…
And that’s why VWO’s conclusion that accepts that multi-armed bandit algos could EVER be good for CRO is ridiculous.
According to their own data, random testing would – in a worst-case scenario where you bothered testing insignificant changes – get you results twice as fast as a multi-armed bandit algo. In the best-case scenario, you could expect to get statistically significant data 5-6 times as fast.
In plain language,
– Multi-armed bandits need you to test something for 6 months to provide you optimal CRO results during those 6 months.
– Random testing lets you test something in 1-3 months, and then you can run a different test, and again and again and again…
Take that over the course of a year of split testing, and what you’ve done is traded big gains over the longer term for small gains over the mid term. That doesn’t sound like optimization to me.
Questions? Criticisms? Suggestions? Comment below :D!
P.S. For those working in large research organizations like pharmaceutical companies, the multi-armed bandit isn’t great in all situations either, because it’s modeled on the concept of immediate reward. Drugs like cancer treatments can only be proven effective over the long-term. This suggests that there’s a market opportunity for developing a smarter approach to predictive testing, that measures/is able to forecast long-term results.
Also, Wikipedia suggests that multi-armed bandits are used in pharma research because of uncertainty about resource allocation, but imho there are broader issues. How long will it take to go from R&D to the market? CRO essentially implements immediately, based on market validation. Science and consumer protection require intermediary steps like the FDA, but there’s a HUGE opportunity for anyone who can challenge the foundational assumptions about pharma testing and speed things up. Perhaps starting with analyzing bottlenecks in the process and thinking how to widen the neck, pop the cork or just go around them entirely…