Glossary · E-commerce ML

Contextual Bandits

Definition

Contextual bandits are online learning algorithms that choose an action (a price, a layout, a recommendation) given a context (user features), observe a reward, and update their policy to balance exploration and exploitation. They are the modern foundation of real-time personalization and dynamic pricing.

Unlike A/B tests, contextual bandits learn continuously and route users to the currently best-performing option, minimizing regret over time. LinUCB, Thompson Sampling, and neural bandits handle different context/reward structures. Fairness-constrained variants add explicit constraints to prevent systematic disadvantage for protected groups. Practical deployment requires careful reward definition, off-policy evaluation, and guardrails for distribution shift.

Essays on this concept