Glossary · E-commerce ML
Contextual Bandits
Definition
Contextual bandits are online learning algorithms that choose an action (a price, a layout, a recommendation) given a context (user features), observe a reward, and update their policy to balance exploration and exploitation. They are the modern foundation of real-time personalization and dynamic pricing.
Unlike A/B tests, contextual bandits learn continuously and route users to the currently best-performing option, minimizing regret over time. LinUCB, Thompson Sampling, and neural bandits handle different context/reward structures. Fairness-constrained variants add explicit constraints to prevent systematic disadvantage for protected groups. Practical deployment requires careful reward definition, off-policy evaluation, and guardrails for distribution shift.
Essays on this concept
- Marketing Engineering
Building a Real-Time Personalization Engine: From Contextual Bandits to Deep Reinforcement Learning
A/B tests answer 'which variant is best on average.' Contextual bandits answer 'which variant is best for this user right now.' The difference in cumulative regret — and revenue — compounds daily.
- E-commerce ML
Dynamic Pricing Under Demand Uncertainty: A Contextual Bandit Approach with Fairness Constraints
Airlines have done dynamic pricing for decades. E-commerce is catching up — but without the fairness constraints that prevent algorithms from charging different people different prices for the same product based on inferred willingness to pay.
- E-commerce ML
Cold-Start Problem Solved: Few-Shot Learning for New Product Recommendations Using Meta-Learning
New products get no recommendations. No recommendations means no clicks. No clicks means no data. No data means no recommendations. Meta-learning breaks this loop by transferring knowledge from products that came before.
- E-commerce ML
Personalized Promotion Optimization: Uplift Modeling to Identify Who Needs a Discount vs. Who Would Buy Anyway
70% of promotional spend goes to customers who would have purchased at full price. Uplift modeling identifies the 30% whose behavior actually changes with a discount — and ignores the rest. The math isn't complicated. The organizational willingness to stop blanket discounting is.
Related concepts
Authoritative references