Checkout Flow Micro-Optimization vs. Macro-Redesign

TL;DR: Most checkout flows do not need to be redesigned. They need to have a small number of high-leverage friction surfaces removed, an address form that does not punish the user, a clear cost preview before payment, and a payment-method set that matches the customer mix. The Baymard Institute's research on 214 top-grossing US and EU sites finds that the average e-commerce checkout has 39 distinct usability improvements available and only a fraction of them require any structural redesign. The one-page versus multi-step argument is mostly a distraction; what matters is the friction-per-step ratio and the cognitive load each step imposes. The marginal-lift framework that works ranks interventions by expected lift per engineering hour, and most of the top of the queue is micro-optimisation.

A note on tools and brands. The Baymard Institute, Booking.com, ASOS, Amazon, Shopify, Stripe, Adyen, and the various conversion-optimisation toolkit vendors appear throughout this essay as the available public-research and case-study sources. Jakob Nielsen, Christian Holst, Karen Holst, Peep Laja, and Andre Morys appear as named practitioners whose public work informs the discussion. Quantitative claims framed as advisory-engagement observation come from anonymized partner operators in mid-market and enterprise e-commerce, not from the named companies. Public claims are attributed inline.

Why the Checkout Is the Wrong Place to Be Ambitious

The checkout flow is the highest-stakes surface in most e-commerce operations. Every percentage point of conversion lift at checkout converts directly to revenue, and the population reaching checkout is already past most of the upstream qualification work. A change that lifts checkout conversion 1 percent on a site doing $200 million in annual revenue is worth approximately $2 million per year, and the change might cost two weeks of engineering effort. The leverage is obvious; the trap is that the same leverage attracts the wrong kind of intervention.

Most checkout redesign projects in our advisory experience produce smaller lifts than the equivalent budget spent on targeted micro-optimisation. The reason is not that redesigns are bad in principle; it is that redesigns introduce many simultaneous changes, which makes the experimentation hard to read, the regressions hard to isolate, and the partial wins hard to attribute. A site that ships a redesigned checkout and sees a 1.5 percent lift cannot tell you which of the 18 changes drove the lift, which of them were neutral, and which of them were negative and were masked by the wins. The same site that ships 18 micro-tests sequentially over six months produces a cleaner attribution map and often produces a larger cumulative lift, because the negative changes get killed instead of riding along.

The Baymard Institute's body of research, the most comprehensive public dataset on e-commerce checkout usability, has documented the pattern for over a decade. Baymard's benchmark of 214 top-grossing US and EU e-commerce sites against 134 guidelines, with the resulting database of approximately 7,800 manually-reviewed elements across 440 distinct checkout steps, has consistently found that the average site has 39 distinct usability improvements available in its current flow. Most of those improvements are micro: a clearer label, a better-validated form field, a more transparent cost preview, a payment-method icon set that matches the local market. Very few of them require a structural redesign.

Contrary to the Conventional View

Conventional view

A redesigned checkout flow will produce a larger conversion lift than incremental optimisation.

What the evidence shows

The win rate from full checkout redesigns in our advisory partner data is lower than the win rate from disciplined micro-optimisation programmes over the same time horizon, and the average magnitude of the win is smaller when normalised against engineering cost. Redesigns produce a small number of decisive failures (the new flow tests worse than the old one, and the project rolls back at significant cost), a smaller number of decisive wins (the new flow is materially better and the win is large), and a large middle of ambiguous results where the change moves the metric a little but the multivariate intervention makes it impossible to know what worked. Micro-optimisation produces a noisier individual-test distribution but a cleaner cumulative result.

What the Baymard Research Actually Finds

The Baymard Institute's published research over the past decade has converged on a small number of operating findings that have held up across multiple study iterations and across the major e-commerce categories.

The first finding is that the average documented cart abandonment rate is approximately 70.19 percent, aggregated across the studies Baymard has tracked from 2006 to 2023. The headline number obscures large variation by category, by traffic source, and by device, but the central estimate is robust. The implication is that approximately 7 out of every 10 customers who add an item to a cart do not complete the purchase, which sets the size of the prize for any checkout optimisation effort.

The second finding is that approximately 22 percent of abandonment in Baymard's representative survey work is attributable to "the website wanted me to create an account," which has placed the guest-checkout availability question at the top of the high-leverage micro-optimisation list for years. Sites that block guest checkout, or that bury the guest option behind the account-creation flow, leave material conversion on the table.

The third finding is that account creation friction is one of several recurring categories of high-leverage friction. The others include unexpected costs revealed late (shipping costs, taxes, fees that appear only on the final step), required form fields that should be optional (company name, second address line, phone number), payment methods that do not match the customer mix in the operator's market, and address forms that fight the user (poor autocomplete, restrictive validation, country-specific format issues).

The fourth finding is that the typical checkout has many small friction sources rather than one large one. The benchmark scoring of 7,800 elements across 440 checkout steps in the Baymard database reveals a long-tailed distribution: most checkouts have a small number of severely problematic elements (which deserve immediate attention) and many moderately problematic elements (which deserve disciplined ongoing optimisation), rather than one structural problem that a redesign could solve.

The pattern that emerges from the Baymard work is the empirical justification for the micro-optimisation posture. If the typical checkout has 39 distinct improvements available, and the marginal improvement adds 0.05 to 0.4 percentage points of conversion, the cumulative effect of a disciplined six to twelve month optimisation programme that ships 15 to 25 of the available improvements is substantially larger than the typical result from a redesign that bundles 8 to 12 of the same changes into a single ship.

The fifth finding from the Baymard body of work, less often quoted but operationally important, is that mobile checkout conversion lags desktop by a wide margin even on sites that have invested heavily in mobile optimisation. The Baymard mobile benchmarking has consistently found that mobile-specific friction (the smaller input fields, the keyboard switching, the context loss when the payment app takes over the screen, the differential autofill behaviour across browsers and operating systems) costs operators meaningful conversion on the device population that now represents the majority of traffic in most categories. The implication is that the micro-optimisation queue should be device-segmented, with the mobile-specific items often producing larger absolute lift than the desktop equivalents.

The Friction Categories That Move Conversion

Across the public research and the advisory partner data, the friction categories that consistently move checkout conversion fall into a manageable set. Naming them is the first step toward a structured optimisation queue.

The first category is the guest-checkout availability. Sites that require account creation before purchase typically lose 5 to 15 percent of would-be conversions to abandonment, with the share rising for new-to-brand visitors and falling for repeat customers. The fix is to expose guest checkout as the default option, with account creation offered post-purchase as a convenience rather than a gate. The engineering cost is moderate (the account-creation system needs to handle the case where an order is placed without a pre-existing account), and the lift is typically the largest single intervention available.

The second category is the cost transparency. Customers who reach the final payment step and discover shipping costs, taxes, or fees they had not seen earlier abandon at a much higher rate than customers who saw the total cost throughout the flow. The fix is to surface estimated shipping and taxes as early as the cart page, or to commit to free shipping above a clearly-displayed threshold. The engineering cost varies (estimated shipping requires the address or at least the postal code, which adds a step earlier in the flow), and the lift in partner data is typically in the 1 to 4 percent range.

The third category is the address form. Address fields that fight the user (no autocomplete, restrictive validation, country-specific format issues, fields that block on optional information) introduce friction at the highest-stakes point in the flow. The fix is autocomplete (Google Places, Loqate, or the equivalent), generous validation rules, and clear country-specific formats. The engineering cost is moderate and the lift in partner data is typically in the 0.5 to 2 percent range.

The fourth category is the payment-method set. Customers who do not see their preferred payment method (local debit network in regions where it dominates, BNPL options for the relevant categories, digital wallets that match the device population) abandon at higher rates than customers whose method is visible. The fix is to support the local mix; the engineering cost is the integration cost for each method, and the lift varies widely by market.

The fifth category is the mobile-specific friction. Mobile checkouts have additional friction surfaces (small input fields, the keyboard issue, the autofill issue, the payment-flow context switch) that desktop checkouts do not. The fix is mobile-first design that uses native autofill, proper input types (numeric keypads for credit card fields, email keypads for email fields, password manager integration), and clear progress indicators. The lift on mobile is typically larger in absolute terms because the friction is larger.

The sixth category is the error-handling. Forms that reject input without clear explanation, that lose the user's work on submission failure, or that produce vague error messages introduce abandonment at the moment of greatest emotional commitment. The fix is inline validation, clear error messaging, and field-level error recovery. The engineering cost is moderate; the lift is typically in the 0.3 to 1.5 percent range.

The Six High-Leverage Friction Categories at Checkout

Category	Typical Lift Range	Engineering Cost	Common Failure Mode
Guest-checkout availability	1.5-5% lift on guest-eligible traffic	Moderate	Account-creation gate blocks the highest-intent buyers
Cost transparency (shipping, taxes, fees)	1-4% lift	Moderate to high	Final-step fee reveal triggers abandonment after sunk-cost buildup
Address-form usability	0.5-2% lift	Low to moderate	No autocomplete, restrictive validation, country-format issues
Payment-method coverage	0.5-3% lift, varies by market	Per-method integration	Missing local debit, BNPL, or wallet relevant to customer mix
Mobile-specific friction	1-5% mobile lift	Moderate	Desktop-first design adapted to mobile rather than mobile-first
Error handling and validation	0.3-1.5% lift	Low to moderate	Vague errors, lost work on submission, no inline validation

The six categories cover the large majority of available lift in most checkouts. A disciplined audit against these categories typically produces a queue of 8 to 20 specific interventions, each scoped, prioritised by expected lift over engineering cost, and ready to ship into the test infrastructure.

A seventh category, which sits at the boundary between checkout-internal work and the broader site experience, is the cart-to-checkout transition. The cart page is the first checkout-flow surface most users see, and the design choices there (the prominence and clarity of the checkout button, the visibility of trust signals, the cost summary that should match what appears later in the flow, the upsell and cross-sell prompts that should not distract from the conversion path) materially affect the rate at which cart visitors enter the checkout proper. The cart-to-checkout dropoff is often the largest single drop in the funnel and deserves explicit attention rather than being treated as a pre-checkout problem.

An eighth category that is increasingly relevant is the account-and-identity surface. Customers who have an account but are not signed in (the most common state at checkout) need a frictionless sign-in path; customers who do not have an account need the guest path and a quick post-purchase account-creation option; customers who use social or single-sign-on need the appropriate authentication. The identity surface design has measurable effects on conversion and on the downstream retention rate, and it is often under-invested-in because the engineering complexity sits at the intersection of the commerce team and the identity team.

The One-Page vs. Multi-Step Debate

The most-discussed structural question in checkout design is whether one-page or multi-step checkout produces higher conversion. The discourse has been intense for over a decade and the empirical answer is more nuanced than either side typically acknowledges.

The one-page argument is that fewer page loads reduce abandonment, that the customer sees the entire commitment up front, and that mobile users particularly benefit from not having to navigate between steps. The multi-step argument is that breaking the form into discrete steps reduces cognitive load, that the progress indicator builds momentum, and that the per-page form length is more digestible.

The empirical evidence from the Baymard work and from the partner-data testing we have run is that both formats can produce high-converting checkouts and both can produce low-converting ones, and the format choice matters less than the within-format execution. A well-executed multi-step checkout (clear progress, easy back-navigation, persistent cart visibility, intelligent field grouping) typically converts at a similar rate to a well-executed one-page checkout (clean visual hierarchy, fast inline validation, smooth conditional reveal, no scroll fatigue). Both formats can be ruined by poor execution of the underlying friction categories.

The strategic implication is to choose the format on the basis of what the engineering team can execute well, not on the basis of a categorical preference. A team comfortable with single-page applications, fast inline validation, and dynamic form behaviour can execute one-page well. A team more comfortable with traditional page-based flows can execute multi-step well. The format choice is secondary; the friction-per-step execution is primary.

Checkout Conversion by Format and Execution Quality, Practitioner Estimate

The chart's central observation is that the well-executed pair (one-page and multi-step) is much closer to each other than either is to its poorly-executed counterpart. The format-versus-format gap is small; the execution gap within each format is large. Operators who spend their optimisation budget on choosing the right format and skimp on the within-format execution tend to under-perform operators who pick either format and execute the friction categories well.

A useful frame from the published Baymard work is that a multi-step flow with a persistent order summary panel functions perceptually like a one-page flow with horizontal navigation, and a one-page flow with progressive disclosure functions perceptually like a short multi-step flow. The two formats converge in execution when both are designed against the same friction principles. The discourse around the format choice is therefore less about which format is structurally better and more about which set of design idioms a particular team executes more naturally.

Cumulative Lift Over 12 Months: Micro-Optimisation vs. Redesign Programmes (Practitioner Estimate)

The scatter shape that recurs across partner engagements is consistent: micro-programmes cluster in the lower-hours-positive-lift region and grow approximately linearly with effort, while redesign programmes cluster in the higher-hours-uncertain-lift region with a meaningful share producing negative outcomes that have to be rolled back. The expected-value calculation favours micro-programmes by a wide margin in most operator situations, with the redesign case being defensible only in the specific forcing situations covered later.

The Booking.com and ASOS Patterns

The public CRO conference circuit has been generous over the years in showing how the largest e-commerce operators run their checkout optimisation. The Booking.com and ASOS public talks are particularly instructive because both operators have built strong experimentation cultures and have shared substantive operating detail.

Booking.com's public discussion of its experimentation programme (across numerous Stuart Frisby and Lukas Vermeer talks at industry conferences, the various internal blog posts the company has published, and the academic papers Vermeer co-authored on Booking's experimentation infrastructure) has emphasised the discipline of running thousands of small experiments per year and accepting that most of them will produce flat or negative results. The aggregate lift comes from the small minority of experiments that produce a measurable win, accumulated over time. The discipline that matters is to run the experiments cleanly, kill the losers quickly, and ship the winners.

The ASOS public talks (including the various conference appearances by their CRO and product teams) have emphasised the mobile-first execution and the importance of getting the basics right before pursuing exotic optimisations. The ASOS checkout has converged over multiple iterations toward a pattern of clear cost preview, generous guest-checkout, comprehensive payment-method coverage for the markets they operate in, and tight error-handling. The lessons that have made it into the public talks are less about radical innovation and more about disciplined incremental improvement.

The common thread across the published case studies from the largest operators is the same: high-volume experimentation, micro-optimisation as the default mode, and structural redesigns reserved for specific situations (new platform launch, new market entry, regulatory requirements that force a structural change). The operating posture is the opposite of the consulting-driven "let's redesign the checkout" project that dominates mid-market operator decision-making.

Amazon's published commentary on its checkout (across various Jeff Bezos shareholder letters and the various interviews with the consumer team) has emphasised the same theme from a different angle: the one-click checkout patent and the broader effort to reduce per-transaction friction were the result of continuous incremental work, not a single redesign push. The friction-reduction posture has compounded over two decades into a structural moat that competitors have not been able to match by running a single redesign project, because the underlying capability (the saved payment methods, the address book, the dispute-handling infrastructure, the fraud model that lets one-click work) is itself the result of compounded micro-investment rather than a single architectural decision.

The Marginal-Lift Framework

The same logic that applies to content refresh prioritisation applies to checkout optimisation prioritisation. The framework ranks candidates by expected lift per engineering hour, with adjustments for strategic value and for risk.

Step one is the audit-driven candidate list. Run a structured audit against the six friction categories, plus the operator-specific elements (anything that is unusual about the operator's flow, market, or customer mix). The audit produces 20 to 60 candidate interventions of varying scope.

Step two is the expected lift estimate. For each candidate, estimate the conversion lift on the traffic segment it affects, based on the public research, the partner data, and the operator's own experimentation history. The estimates are necessarily approximate; the framework accommodates the imprecision by ranking on order-of-magnitude differences rather than on small numerical distinctions.

Step three is the engineering cost estimate. For each candidate, estimate the engineering hours required to ship, including design, development, QA, and any platform-specific work. The estimate should include the test infrastructure work if the change requires new tracking.

Step four is the strategic value adjustment. Some interventions have value beyond the immediate conversion lift. A change that improves accessibility (which reduces legal exposure in the EU and the US), that supports a new market expansion, that reduces customer support load, or that increases the long-term retention rate has additional value that the immediate-conversion calculation misses.

Step five is the risk adjustment. Some interventions have downside risk that the upside-lift calculation does not capture. A payment-method change that could fail in production has revenue-loss risk. A privacy-policy change that could trigger regulator attention has compliance risk. The risk should be priced explicitly into the ranking.

The composite score is roughly: expected value = (lift on segment) times (segment traffic) divided by (engineering hours), with strategic and risk multipliers. Stack-rank candidates by composite score; ship the top of the queue first.

Checkout micro-optimisation prioritisation workflow

Loading diagram...

The workflow is iterative. As experiments ship and the operator learns which interventions perform best in its specific context, the prioritisation calibrates. The framework's value is not in producing a precise ranking on the first pass; it is in forcing the comparison of like-with-like and in preventing the engineering team from spending capacity on changes that intuitively feel important but do not move the metric.

When Redesign Actually Makes Sense

The argument for micro-optimisation as the default is not an argument that redesigns are never appropriate. There are specific situations where a redesign is the right call.

The first is the platform migration. An operator moving from a legacy commerce platform to a modern one, or rebuilding the checkout on a different framework for performance reasons, has no choice; the redesign is structural. The optimisation discipline still applies (the new platform's checkout should be designed against the six friction categories from day one), but the redesign decision is forced by the platform change.

The second is the regulatory requirement. New privacy regulations, accessibility requirements, or payment-flow rules can force structural changes that cannot be accommodated by micro-optimisation alone. The PSD2 and SCA requirements in Europe, the various accessibility-litigation pressures in the US, and the data-protection requirements in multiple jurisdictions have all produced cases where a structural redesign was the right response.

The third is the multi-market expansion. An operator expanding from a single-market checkout to a multi-market one often discovers that the existing flow does not generalise. The address format assumptions, the payment-method assumptions, the tax-display assumptions, and the regulatory-disclosure assumptions all need to be rebuilt. The redesign is necessary; the optimisation discipline applies within each market's instantiation.

The fourth is the structural conversion ceiling. An operator who has run a disciplined micro-optimisation programme for 12 to 24 months and is no longer producing measurable lift may have reached the structural limits of the existing flow. The diagnostic is that the experiment win rate has dropped to noise levels, the average win magnitude has compressed, and the remaining interventions in the queue are all small. At that point, a structural rethink may unlock the next phase. The redesign should be informed by the learning from the micro-optimisation programme; the data from the dozens of experiments tells the redesign team which directions are promising and which to avoid.

The fifth case is the strategic-positioning redesign. Some operators redesign the checkout because the existing flow no longer reflects the brand positioning the operator wants to communicate, because a competitor has shipped a notably differentiated experience that requires a strategic response, or because the operator is repositioning into a higher-end or lower-end market segment that calls for a different experience. The strategic-positioning case is hard to evaluate on conversion metrics alone because the redesign's purpose is not primarily conversion; it is brand and positioning. Operators in this case should still measure conversion impact carefully and should still use the micro-optimisation discipline for the post-launch refinement, but the redesign decision itself is justified on broader grounds than the conversion-lift estimate.

Operating the Programme

The checkout optimisation programme as an organisational practice needs a few standing properties to compound. The first is a dedicated experimentation infrastructure: an A/B testing platform (in-house or vendor), a tagging discipline that captures the right events, an analyst function that reads the results, and a deployment pipeline that ships winning experiments to production cleanly.

The second is a statistical discipline. Most operators run experiments with insufficient power, declare wins on misleading early data, and accept multiple-comparison contamination across simultaneous tests. The discipline that matters is to set sample-size targets in advance, to honour them, to use sequential testing where appropriate, and to apply the corrections that the statistics literature has documented for the situation. The Booking.com talks on experimentation methodology have been particularly clear on the magnitude of error introduced by sloppy statistics.

The third is a culture of accepting null results. The experiment that produces a flat or negative result is information; it eliminates an intervention from the queue and frees the team to test the next one. Operators who only count "wins" in their experimentation programme produce both publication bias in their internal reporting and a chilling effect on the willingness to test things that might fail. The win rate in mature experimentation programmes is typically 15 to 25 percent; the operator who expects 70 percent has either inflated expectations or is misreading the results.

The fourth is the discipline of running enough experiments. A programme running 1 to 3 tests per quarter produces too few data points to learn from, and the cumulative lift from the small set of wins is modest. A programme running 8 to 20 tests per quarter produces a richer learning rate and a meaningfully larger cumulative lift over the same period. The operating cost of higher-throughput testing is mostly in the analyst function and in the design and engineering capacity to feed the queue.

Checkout Experimentation Programme: Healthy vs. Theatre Patterns

Property	Healthy Pattern	Theatre Pattern	Diagnostic
Tests per quarter	8-20 well-powered tests	1-3 tests, often under-powered	Test count per quarter is too low
Win rate	15-25% of tests show measurable wins	Reported 60-80% wins, often spurious	Improbable win rate suggests bad statistics or selective reporting
Average win magnitude	0.5-3% conversion lift per winning test	Reported 8-25% lifts; rarely replicate	Large reported wins often do not survive holdout validation
Loser kill rate	Losers killed within 2-4 weeks	Losers linger; multiple-comparison contamination accumulates	Tests run open-ended past sample-size targets
Cumulative annual lift	5-15% over a year of disciplined work	Reported 20-40% but unconfirmed in revenue	Bottom-line revenue should reflect claimed cumulative lift
Statistical method	Pre-registered sample sizes, sequential or Bayesian methods, multiple-comparison correction	Frequentist tests stopped early, no correction, intuitive sample sizes	Method documentation absent or hand-wavy

The theatre pattern is the more common one in mid-market operators, and the gap between the two is mostly a discipline gap rather than a tooling gap. The same A/B testing platform that runs disciplined experiments for one operator runs theatre for another; the difference is in how the platform is used.

Most checkouts have years of micro-optimisation runway available. The redesign-because-we-are-stuck argument is usually a misdiagnosis; the structural ceiling appears later than most operators believe.

The compounding effect of a healthy programme is large. A site that ships 12 winning experiments over a year, with a median win of 1.2 percent and a small standard deviation, has compounded a 12 to 18 percent annualised conversion lift from a programme that costs perhaps two analysts and the corresponding engineering capacity. The same site that runs a single redesign in the same year, with the redesign producing a 4 percent measured lift after two months of validation, has spent more capacity for a smaller result and has lost the opportunity cost of the experiments not run.

A few practical operating notes from advisory engagements. The most underrated investment in a checkout programme is the analytics layer: the tagging on every form field, every error event, every step transition, and every payment-method selection. Operators who can answer "where in the checkout did this cohort drop off, and what did they encounter when they dropped off" have a diagnostic capacity that operators with weaker instrumentation cannot replicate. The instrumentation cost is one-time; the optimisation programme depends on it for the entire programme's duration.

The second is the holdout discipline. Treatment groups in checkout experiments often see ramp-up effects, novelty effects, or contamination from concurrent changes. A small holdout group (1 to 5 percent of traffic, depending on volume) that remains on the unchanged baseline for the full duration of the programme provides a clean reference for cumulative-lift estimation that the rolling-pairwise comparison cannot. The holdout discipline is unfashionable because it sacrifices a small amount of immediate optimisation; the value is in the credibility of the year-end reporting.

The third is the post-launch monitoring. Winning experiments that ship to 100 percent of traffic sometimes do not sustain their measured lift after launch, due to ramp-up artefacts, regression from edge cases, or interaction with subsequent changes. The programme should monitor shipped winners for 4 to 12 weeks post-ship and be willing to roll back changes that fail to sustain. Operators who treat the ship as the end of the experiment lose visibility into the actual production impact.

Key Takeaways

Most checkout flows do not need redesigns. They need a disciplined audit against the six high-leverage friction categories (guest checkout, cost transparency, address-form usability, payment-method coverage, mobile-specific friction, error handling) and a queue of micro-optimisations prioritised by expected lift per engineering hour.
The Baymard Institute's research on 214 top-grossing US and EU sites finds that the average checkout has 39 distinct usability improvements available; the aggregate cart abandonment rate across studies from 2006 to 2023 averages roughly 70 percent.
The one-page versus multi-step debate is mostly a distraction. Both formats can produce high-converting checkouts when well-executed; the within-format execution matters far more than the format choice itself.
The Booking.com and ASOS public talks on experimentation discipline converge on the same operating posture: high-volume small experiments, ruthless killing of losers, accumulated lift from the small minority of winners, and structural redesigns reserved for specific forcing situations.
The marginal-lift framework (audit, candidate list, lift estimate, engineering cost estimate, strategic and risk multipliers, stack-rank) produces a defensible prioritisation that prevents the engineering team from spending capacity on changes that intuitively feel important but do not move the metric.
Redesigns are appropriate in specific cases: platform migration, regulatory requirements, multi-market expansion, and the genuine structural ceiling after 12 to 24 months of disciplined micro-optimisation. The structural-ceiling case is the one most often used to justify premature redesigns; the actual ceiling appears later than most operators believe.
The compounding programme that runs 8 to 20 well-powered experiments per quarter, with the discipline to honour sample sizes and kill losers quickly, produces a meaningfully larger cumulative lift over a one to two year horizon than the same capacity spent on a redesign, with a cleaner attribution map that informs the next round of decisions.