Experimentation Payments Decision analysis

Killing a payment experiment in time

A proposed P&L saving: drop a high-fee wallet from checkout. The experiment showed it would lose multiples of what it saved - so it stayed.

≈9×

over the break-even kill-switch threshold

sequential rounds (broad → targeted)

−5%

checkout-conversion hit in the retest

Kept

the recommendation: do not remove

The client

A subscription business with a high-volume online checkout, selling to both direct consumers and partner-sold members.

The problem

Finance proposed removing a high-fee wallet from checkout to save on transaction fees. The open question: would the conversion lost by removing a method customers actually use outweigh the fee saved? It needed a real experiment and a financial stop rule agreed before launch.

How it ran

A stop rule first, then two rounds, then the root cause - so the recommendation couldn't be relitigated.

Step 1

Break-even rule

Derived the maximum acceptable conversion drop straight from the fee maths, before launch - the kill switch was a number, agreed up front.

Step 2

Broad A/B

Control = all methods; variant = the method removed. 50/50, full traffic, in the highest-exposure market.

Step 3

Kill switch fires

As exposure ramped, the broad test breached the threshold. Called it as soon as it broke and stopped the bleed - the costly version never reached full traffic.

Step 4

Targeted retest

Re-scoped to the segment that looked most resilient and retested at full allocation for a clean, powered read.

Step 5

Root cause

Micro-conversion funnel: flat upstream, the gap opens exactly at the payment step, with a failure spike on the substitute method.

Step 6

Readout

"Where did they go?" plus the loss-vs-saving framing, in a leadership readout with one defensible recommendation.

What I did

01.Built the financial kill switch first - derived the maximum acceptable conversion drop purely from the blended-fee maths, so the stop point was a number fixed before launch and applied the moment the data crossed it.
02.Designed and ran the broad A/B (control = all methods, variant = method removed) at a 50/50 split, full traffic, in the highest-exposure market.
03.When the broad test breached the kill switch, called it immediately and stopped it before the costly variant could do real damage at full traffic.
04.Re-scoped to the segment that looked materially more resilient and retested at full allocation to get a clean, properly powered read.
05.Instrumented micro-conversion metrics (begin-checkout → payment-started → submitted → completed) to localise exactly where the friction sat.
06.Validated the root cause: the funnel was flat upstream and the gap opened at the payment step, with a payment-failure spike on the substitute method, with the upstream funnel and sample ruled out as the cause.
07.Ran the "where did they go?" analysis - which methods absorbed displaced users, and the share that never converted at all.
08.Framed the lost conversions against the fee saving and delivered a leadership readout with a single, defensible recommendation: keep the method.

InteractiveTry the stop rule

When does a fee saving stop paying for itself?

Illustrative model, sample figures only

Fee on the method you'd remove

Customers who use it

Avg fee on the alternatives

Observed conversion drop

The break-even drop comes only from the two fee rates - price and lifetime cancel out. The kill switch is that number, set before the test runs.

Break-even drop

Observed vs. that

Net revenue · keep the method

Net revenue · remove it

…

Result

A decision that held, made fast and cheaply.

The stop rule came first

The maximum acceptable conversion drop was derived from the fee maths and agreed before launch. When the data came in, the call to stop was already made.

Caught early

The broad test breached the threshold while exposure was still ramping, and was stopped as soon as it broke. The version that loses money never reached full traffic.

Root cause confirmed

Flat upstream, the gap opening at the payment step, the substitute method failing more often. The conversion loss traced cleanly to the missing method, and the finding held up in the room.

The number that ended the debate

The loss ran an order of magnitude over the break-even threshold. Fee savings can't fund that - so the recommendation was one line, and it stuck.

Tools used

BigQuery SQL Python Statsig Hex

A similar problem in your stack?

Send me the rough shape of it. I'll figure out scope on a 30-min call.

Book a call