Glossary · Business Analytics

A/B Testing

also: split testing · online controlled experiments · OCE · randomized controlled trial · RCT

Definition

A/B testing is a randomized controlled experiment that splits users into a treatment and a control variant to estimate the causal effect of a change on a chosen metric. Statistical validity depends on randomization quality, sample size, novelty effect controls, and correction for multiple comparisons.

A/B testing applies the randomized controlled trial framework to product and marketing decisions. The Kohavi/Tang/Xu 2020 framing is canonical: an Overall Evaluation Criterion (OEC), random assignment, sample-size planning via minimum detectable effect (MDE), and validity guards against Simpson's paradox, novelty effects, and Twyman's law. Bayesian formulations (posterior-odds stopping rules) trade off the false-positive control of frequentist methods for faster decision velocity. Production pitfalls: SRM (sample ratio mismatch) failures, segment-level interaction effects, and the multiple-comparisons inflation that makes 5% of all tests look significant on noise alone.

Essays on this concept

Business Analytics
Bayesian A/B Testing in Practice: When to Stop Experiments and How to Communicate Results to Non-Technical Stakeholders
Frequentist A/B testing answers a question nobody asked: 'If the null hypothesis were true, how surprising is this data?' Bayesian testing answers the question that matters: 'Given this data, what's the probability that B is actually better?'
Pricing Strategy
Pricing Experimentation Without the Legal Risk: An Operator Framework for Defensible A/B Tests
Price A/B tests are not, by themselves, illegal. Most of the legal risk lies in how the cohorts are formed, what data is used, and what the team can show a regulator a year later. This is the framework that survives the question.
E-commerce ML
Personalized Promotion Optimization: Uplift Modeling to Identify Who Needs a Discount vs. Who Would Buy Anyway
70% of promotional spend goes to customers who would have purchased at full price. Uplift modeling identifies the 30% whose behavior actually changes with a discount, and ignores the rest. The math isn't complicated. The organizational willingness to stop blanket discounting is.
Conversion Optimization
Trust Signals and Their Measurable Lift: A Field-Test Compendium
A field-test compendium of trust signals (SSL badges, guarantees, testimonials, reviews, press logos, accreditations) and what the actual lift literature says about each, with the standard caveat that trust-signal lift is highly context-dependent.
Conversion Optimization
Checkout Flow Micro-Optimization vs. Macro-Redesign
When small checkout tweaks return more than full rewrites, what the Baymard Institute research actually says, and a decision framework for choosing between incremental optimization and macro redesign.
Marketing Engineering
Unified Measurement Architecture: Connecting MMM, MTA, and Experimentation Into a Single Source of Truth
MMM says Facebook works. MTA says Google works. The incrementality test says neither works as well as you thought. Three measurement systems, three different answers, here's how to reconcile them into one coherent picture.
Digital Economics
Attention Economics Quantified: Measuring the True CPM of Cognitive Load in Digital Advertising
CPM measures whether an ad loaded in a browser. It says nothing about whether a human noticed it. Here's a framework for pricing what actually matters, the cognitive cost of attention, and why the gap between CPM and true attention cost is where billions in ad spend disappear.
Conversion Optimization
Card Sorting and Information Architecture Validation in Production
The IA validation pipeline from open and closed card sorts to tree testing to first-click testing to production navigation A/B tests, and the under-discussed sample-divergence problem when card-sort participants do not match real visitors.
Business Analytics
Causal Discovery in Business Data: Applying PC Algorithm and FCI to Find Revenue Drivers Without Experiments
Correlation tells you that feature usage and retention move together. It doesn't tell you which causes which, or whether a third factor drives both. Causal discovery algorithms can untangle this from observational data alone.
Behavioral Economics
Choice Architecture at Scale: How Default Options Drive $2.3B in Incremental E-commerce Revenue
An empirical examination of default effects in digital commerce, from Thaler and Sunstein's nudge theory to the precise mechanics of how pre-selected options generate billions in revenue most consumers never consciously chose to spend.
E-commerce ML
Cold-Start Problem Solved: Few-Shot Learning for New Product Recommendations Using Meta-Learning
New products get no recommendations. No recommendations means no clicks. No clicks means no data. No data means no recommendations. Meta-learning breaks this loop by transferring knowledge from products that came before.
Conversion Optimization
Color, Contrast, and Accessibility as Conversion Levers
Accessibility is usually framed as compliance. The operator framing is that contrast, focus indicators, and motion preferences are first-order conversion levers, with measurable lift on a large addressable population.
Marketing Engineering
Creative Fatigue Detection Using Entropy Metrics: An Automated Framework for Ad Refresh Cycles
By the time your dashboard shows declining CTR, creative fatigue has already cost you weeks of wasted spend. Shannon entropy applied to engagement signals detects fatigue 11 days earlier than traditional frequency caps.
Conversion Optimization
CRO for B2B Long-Cycle Journeys: The Multi-Touch Reality
Why classical CRO assumptions break in B2B. Long cycles, multi-stakeholder committees, weak in-flight signals, and attribution noise turn funnel-stage optimisation into content-led measurement.
Conversion Optimization
The CRO Decision Pyramid: Where Conversion-Optimization Effort Actually Returns
Prioritizing CRO investment by tier. The base (speed, trust, accessibility) returns reliably. The middle (copy, layout, social proof) returns conditionally. The top (personalization) returns only with infrastructure.
Pricing Strategy
Currency Localization and Willingness-to-Pay Differentials
Local-currency presentation moves willingness to pay by 5 to 15% in tested field experiments. The math behind PPP adjustment, the operational complexity, and where the easy framing breaks down for B2B and tax.
Behavioral Economics
The Decoy Effect Reimagined: Dynamic Price Anchoring with Real-Time Behavioral Segmentation
A dominated third option can shift 22% more users to your premium plan. But the static decoy is dead, here's how real-time behavioral data makes asymmetric dominance adaptive.
E-commerce ML
Demand Forecasting with Conformal Prediction: Reliable Uncertainty Intervals for Inventory Optimization
Your demand forecast says you'll sell 1,000 units next month. How confident is that prediction? Traditional models give you a number without honest uncertainty bounds. Conformal prediction gives you intervals with mathematical coverage guarantees, no distributional assumptions required.
E-commerce ML
Dynamic Pricing Under Demand Uncertainty: A Contextual Bandit Approach with Fairness Constraints
Airlines have done dynamic pricing for decades. E-commerce is catching up - but without the fairness constraints that prevent algorithms from charging different people different prices for the same product based on inferred willingness to pay.
E-commerce ML
Graph Neural Networks for Cross-Sell: Modeling the Product Co-Purchase Network at Scale
Association rules find that beer and diapers are co-purchased. Graph neural networks understand why, the underlying structure of complementary needs, occasion-based shopping, and brand affinity networks that connect products across categories.
Marketing Engineering
Incrementality Testing at Scale: A Geo-Lift Framework for Measuring True Campaign Impact
Half your marketing budget is wasted. The classic joke, except now we can identify which half, geo-lift experiments measure what would have happened without the campaign, not just what happened with it.
Marketing Strategy
Jobs-to-Be-Done Segmentation Using NLP: Mining Customer Reviews to Discover Unmet Needs at Scale
Christensen said customers 'hire' products for jobs. Traditionally, discovering those jobs required expensive qualitative research. NLP applied to millions of customer reviews can surface the same jobs, plus ones that interviews miss because customers can't articulate them.
E-commerce ML
LLM-Powered Catalog Enrichment: Automated Attribute Extraction, Taxonomy Mapping, and SEO Generation
The average e-commerce catalog has 40% missing attributes, inconsistent taxonomy, and product descriptions written by suppliers who don't speak the customer's language. LLMs can fix all three, if you build the right quality assurance pipeline around them.
Conversion Optimization
Loading Speed as a Conversion Variable: Lab vs. Field Data
Why Lighthouse lab scores and Core Web Vitals field data disagree, how each correlates with conversion, and when lab optimization fails to translate to field gains.
Marketing Engineering
Marketing Mix Modeling in the Privacy-First Era: Bayesian Structural Time Series Without User-Level Data
Cookies are dying. Deterministic attribution is shrinking. The irony: the measurement approach from the 1960s, Marketing Mix Modeling, is making a comeback, now powered by Bayesian inference that would have been computationally impossible when it was first invented.
Behavioral Economics
Mental Accounting in Multi-Currency E-commerce: How Payment Framing Shifts Willingness to Pay by 23%
Thaler showed that people don't treat money as fungible. In cross-border e-commerce, currency display alone shifts willingness to pay by 23%, and most checkout flows ignore this entirely.
Business Analytics
Metric Ontology Design: Building a Self-Serve Analytics Layer That Doesn't Collapse Under Ambiguity
Ask five people in your company what 'revenue' means and you'll get five different numbers. The problem isn't the data warehouse, it's that nobody agreed on the definitions before building dashboards on top of them.
Behavioral Economics
The Mood Index: Reading Affect, Compulsivity, and Identity Signals in Cosmetics E-commerce Baskets
Cosmetics is the only consumer e-commerce category where four clinical psychology mechanisms operate at unusually high intensity at the same time. Each one leaves a distinct fingerprint in checkout data. Standard segmentation models miss most of it.
Marketing Engineering
Multi-Touch Attribution Is Broken, A Causal Inference Approach Using Directed Acyclic Graphs
MTA models overestimate retargeting by 340% and underestimate display by 62%. The fix isn't better heuristics, it's abandoning correlational attribution entirely in favor of causal graphs.
Conversion Optimization
The Personalization-Experimentation Paradox
Personalization platforms promise per-user pages. A/B tests assume equivalent groups. Reconciling the two requires heterogeneous treatment effects, uplift modeling, and an honest read of "personalized lift".
Pricing Strategy
Pricing Pages as Information Architecture
The pricing page is the highest-leverage UX surface in most SaaS products. Treat it as information architecture, and the conversion math reorganizes around plan structure, comparison cognition, and CTA placement.
E-commerce ML
Real-Time Fraud Detection at Checkout: A Streaming ML Pipeline Architecture with Sub-100ms Latency
You have 100 milliseconds to decide whether a transaction is fraudulent. In that window, you need to compute 200+ features from streaming data, run inference on a model trained on 1:1000 class imbalance, and return a score that balances revenue loss against customer friction.
Marketing Engineering
Building a Real-Time Personalization Engine: From Contextual Bandits to Deep Reinforcement Learning
A/B tests answer 'which variant is best on average.' Contextual bandits answer 'which variant is best for this user right now.' The difference in cumulative regret, and revenue, compounds daily.
E-commerce ML
Search Ranking as a Revenue Optimization Problem: Learning-to-Rank with Business Objective Regularization
E-commerce search is not Google search. When a user types 'running shoes,' the goal isn't to find the most relevant document, it's to surface the product most likely to be purchased at the highest margin. This reframes ranking as a constrained revenue optimization problem.
Business Analytics
Survival Analysis for Subscription Businesses: Cox Proportional Hazards vs. Deep Recurrent Models
Binary churn models answer the wrong question. 'Will this user churn?' matters less than 'When will this user churn?' Survival analysis models the timing - and the when determines whether intervention is profitable.
E-commerce ML
Transformer-Based Product Embeddings: Outperforming Collaborative Filtering with Multimodal Representations
Collaborative filtering needs a user to buy before it can recommend. Transformer-based embeddings understand products from their descriptions, images, and the behavioral context of browsing sessions, no purchase history required.
Pricing Strategy
Value-Based Pricing Operationalized: A Measurement Framework
Most teams talk about value-based pricing without operationalizing it. The conjoint and Van Westendorp workflow, the stated-versus-revealed gap, and the cases where it breaks down in practice.

Related concepts

Authoritative references

en.wikipedia.org/wiki/A/B_testing

A/B Testing

Bayesian A/B Testing in Practice: When to Stop Experiments and How to Communicate Results to Non-Technical Stakeholders

Pricing Experimentation Without the Legal Risk: An Operator Framework for Defensible A/B Tests

Personalized Promotion Optimization: Uplift Modeling to Identify Who Needs a Discount vs. Who Would Buy Anyway

Trust Signals and Their Measurable Lift: A Field-Test Compendium

Checkout Flow Micro-Optimization vs. Macro-Redesign

Unified Measurement Architecture: Connecting MMM, MTA, and Experimentation Into a Single Source of Truth

Attention Economics Quantified: Measuring the True CPM of Cognitive Load in Digital Advertising

Card Sorting and Information Architecture Validation in Production

Causal Discovery in Business Data: Applying PC Algorithm and FCI to Find Revenue Drivers Without Experiments

Choice Architecture at Scale: How Default Options Drive $2.3B in Incremental E-commerce Revenue

Cold-Start Problem Solved: Few-Shot Learning for New Product Recommendations Using Meta-Learning

Color, Contrast, and Accessibility as Conversion Levers

Creative Fatigue Detection Using Entropy Metrics: An Automated Framework for Ad Refresh Cycles

CRO for B2B Long-Cycle Journeys: The Multi-Touch Reality

The CRO Decision Pyramid: Where Conversion-Optimization Effort Actually Returns

Currency Localization and Willingness-to-Pay Differentials

The Decoy Effect Reimagined: Dynamic Price Anchoring with Real-Time Behavioral Segmentation

Demand Forecasting with Conformal Prediction: Reliable Uncertainty Intervals for Inventory Optimization

Dynamic Pricing Under Demand Uncertainty: A Contextual Bandit Approach with Fairness Constraints

Graph Neural Networks for Cross-Sell: Modeling the Product Co-Purchase Network at Scale

Incrementality Testing at Scale: A Geo-Lift Framework for Measuring True Campaign Impact

Jobs-to-Be-Done Segmentation Using NLP: Mining Customer Reviews to Discover Unmet Needs at Scale

LLM-Powered Catalog Enrichment: Automated Attribute Extraction, Taxonomy Mapping, and SEO Generation

Loading Speed as a Conversion Variable: Lab vs. Field Data

Marketing Mix Modeling in the Privacy-First Era: Bayesian Structural Time Series Without User-Level Data

Mental Accounting in Multi-Currency E-commerce: How Payment Framing Shifts Willingness to Pay by 23%

Metric Ontology Design: Building a Self-Serve Analytics Layer That Doesn't Collapse Under Ambiguity

The Mood Index: Reading Affect, Compulsivity, and Identity Signals in Cosmetics E-commerce Baskets

Multi-Touch Attribution Is Broken, A Causal Inference Approach Using Directed Acyclic Graphs

The Personalization-Experimentation Paradox

Pricing Pages as Information Architecture

Real-Time Fraud Detection at Checkout: A Streaming ML Pipeline Architecture with Sub-100ms Latency

Building a Real-Time Personalization Engine: From Contextual Bandits to Deep Reinforcement Learning

Search Ranking as a Revenue Optimization Problem: Learning-to-Rank with Business Objective Regularization

Survival Analysis for Subscription Businesses: Cox Proportional Hazards vs. Deep Recurrent Models

Transformer-Based Product Embeddings: Outperforming Collaborative Filtering with Multimodal Representations

Value-Based Pricing Operationalized: A Measurement Framework