Marketing Engineering

Marketing Mix Modeling in the Privacy-First Era: Bayesian Structural Time Series Without User-Level Data

Cookies are dying. Deterministic attribution is shrinking. The irony: the measurement approach from the 1960s — Marketing Mix Modeling — is making a comeback, now powered by Bayesian inference that would have been computationally impossible when it was first invented.

Share

TL;DR: With Apple's ATT gutting mobile attribution and Chrome killing third-party cookies, the 1960s measurement approach -- Marketing Mix Modeling -- is making a comeback, now powered by Bayesian structural time series and MCMC sampling that were computationally impossible when MMM was invented. Modern Bayesian MMM handles carryover effects, diminishing returns, and channel interactions without any user-level data, making it the only privacy-compliant measurement approach that works across both digital and offline channels.


The 1960s Called. They Want Their Measurement Back.

The most sophisticated measurement technique in modern digital marketing was invented before the moon landing.

Marketing Mix Modeling -- the practice of using regression on aggregate time-series data to estimate the contribution of each marketing channel to business outcomes -- was born in the packaged goods aisles of the 1960s. Procter & Gamble, General Mills, and Unilever needed to understand whether their television spots were selling more soap. They did not have cookies, device IDs, or click-through rates. They had sales data, media spend data, and statisticians with slide rules.

For three decades, MMM was the gold standard. Then the internet arrived and promised something better: deterministic, user-level attribution. Every click tracked. Every conversion assigned. Every dollar accounted for with surgical precision.

That promise is now collapsing.

Apple's App Tracking Transparency gutted mobile attribution in 2021. Google confirmed third-party cookie deprecation in Chrome. The EU's Digital Markets Act restricts cross-site tracking. Browser fingerprinting is under regulatory and technical siege. The deterministic attribution infrastructure that powered two decades of digital marketing is being dismantled, piece by piece, by the convergence of privacy regulation and platform economics.

And so the industry is turning back to the 1960s. But the version of MMM being built today would be unrecognizable to those packaged-goods statisticians. It runs on Bayesian inference, Markov Chain Monte Carlo sampling, and structural time-series decomposition -- computational techniques that would have required more processing power than existed on Earth when the method was first conceived.

This is the story of how the oldest measurement idea in marketing became the most modern.

A Brief History of Counting What Counts

The lineage of Marketing Mix Modeling traces to Neil Borden's 1964 article "The Concept of the Marketing Mix" and McCarthy's 4Ps framework. But the statistical practice emerged from econometrics -- specifically, from the application of multivariate regression to sales data.

The early models were simple. Regress weekly sales against television GRPs, print insertions, promotional spending, and a seasonal dummy variable. The coefficients tell you how much each variable contributes to sales. Divide each channel's contribution by its cost and you have a rough return on investment.

These models had severe limitations. They assumed linear relationships between spend and outcomes. They ignored carryover effects -- the fact that an ad seen today might drive a purchase next week. They could not capture diminishing returns. And they required years of stable data to produce reliable estimates.

But they worked well enough for CPG companies running stable media mixes on annual planning cycles. The decision grain was coarse -- shift 5% of budget from print to television -- and the models were accurate enough at that resolution.

The digital era changed the decision grain. Suddenly marketers needed to know whether a specific Facebook campaign targeting 25-34 year-old women in the Midwest was generating incremental conversions. MMM, operating on weekly national aggregates, could not answer that question. Multi-touch attribution (MTA), operating on user-level click and impression data, could. Or at least it claimed it could.

For fifteen years, MTA dominated. Marketing teams built entire organizational structures around it. Attribution vendors raised billions. The fundamental assumption -- that tracking individual user journeys from first touch to conversion produced accurate measurement -- went largely unquestioned.

Then the privacy reckoning arrived.

The collapse of deterministic attribution is not a single event. It is a cascade.

The Privacy Cascade: Key Events Degrading User-Level Tracking

YearEventImpact on Attribution
2017GDPR enacted (enforced 2018)Consent requirements reduced trackable EU users by 30-50%
2020CCPA enforcement beginsCalifornia opt-out rates reached 40% by 2022
2021iOS 14.5 App Tracking TransparencyOnly 25% of iOS users opted into tracking
2022Safari and Firefox block third-party cookies by default35% of web traffic became untrackable overnight
2023Google announces Privacy Sandbox timelineSignal: the entire industry must prepare for cookieless measurement
2024Digital Markets Act gatekeepers designatedCross-site tracking restricted for large platforms in EU
2025Chrome third-party cookie deprecation beginsThe last major browser joins the privacy-first default
2026Privacy Sandbox APIs stabilizeAggregate, cohort-based signals replace user-level tracking

Each event removed a piece of the attribution puzzle. Individually, each was manageable -- marketers found workarounds, modeled around gaps, relied on platform-reported data. Collectively, they destroyed the foundation.

The math is simple. If you can observe 95% of user journeys, MTA is useful. If you can observe 50%, it is misleading. If you can observe 25%, it is fiction. By 2025, the observable share of cross-platform user journeys for the median advertiser had dropped below 40%.

This is not a technology failure. It is a values shift. Societies decided that the convenience of ad targeting did not justify pervasive surveillance. The measurement systems built on that surveillance are collateral damage.

MMM does not require user-level data. It does not require cookies, device IDs, or cross-site tracking. It requires two things: a time series of business outcomes (revenue, conversions, signups) and a time series of marketing inputs (spend, impressions, GRPs by channel). These are first-party data that every company already possesses.

This is why every major technology company -- Google, Meta, Amazon -- has invested heavily in MMM tooling since 2021. Not because they love the 1960s approach. Because it is the only measurement methodology that survives the privacy transition intact.

The Architecture of a Bayesian Structural Time Series Model

Loading diagram...

Classical MMM used ordinary least squares (OLS) regression. Fit a line. Read the coefficients. Done.

The modern approach replaces OLS with Bayesian Structural Time Series (BSTS) modeling. The difference is not incremental. It is architectural.

A BSTS model decomposes a time series into components. The full model can be written as:

yt=μt+γt+c=1CβcSaturation ⁣(Adstock(xc,t))+Ztδ+ϵty_t = \mu_t + \gamma_t + \sum_{c=1}^{C} \beta_c \cdot \text{Saturation}\!\left(\text{Adstock}(x_{c,t})\right) + \mathbf{Z}_t \boldsymbol{\delta} + \epsilon_t

where μt\mu_t is the trend, γt\gamma_t is seasonality, βc\beta_c is the coefficient for channel cc, Zt\mathbf{Z}_t are control variables, and ϵtN(0,σ2)\epsilon_t \sim \mathcal{N}(0, \sigma^2). The individual components are:

Trend. The long-term direction of the business. Is revenue growing at 3% per quarter? Shrinking? Flat? The trend component captures this baseline trajectory, independent of marketing activity.

Seasonality. Weekly, monthly, and annual patterns. Retail spikes at Christmas. B2B software sales dip in August. Ice cream sells more in July. The model separates these recurring patterns from marketing effects.

Regression component. The estimated effect of each marketing channel, after accounting for trend and seasonality. This is where the MMM "answer" lives -- the incremental contribution of each channel.

Residual. Everything the model cannot explain. In a good model, this should look like random noise. If it shows patterns, something is missing.

The Bayesian part means that instead of producing point estimates ("TV drives $2.3M in revenue"), the model produces posterior distributions ("TV drives between $1.8M and $2.9M in revenue, with 90% probability"). This uncertainty quantification -- the same philosophical shift that makes Bayesian A/B testing superior to frequentist approaches -- is not a luxury. It is the difference between confident bad decisions and honest good ones.

BSTS Decomposition: Percentage of Revenue Variance Explained by Component (Illustrative)

In a well-specified model, the marketing channels component typically explains 20-35% of revenue variance. This is a humbling number. It means that two-thirds or more of your revenue trajectory is driven by factors outside marketing's control -- product quality, market growth, competitive dynamics, macroeconomic conditions. Every attribution model that assigns 100% of credit to marketing touchpoints is not merely inaccurate. It is delusional.

The Bayesian framework uses Markov Chain Monte Carlo (MCMC) sampling to estimate the posterior distributions of all parameters simultaneously. This means every parameter estimate accounts for uncertainty in every other parameter. The trend estimate affects the seasonality estimate affects the channel contribution estimates. MCMC handles this joint estimation naturally, producing credible intervals that reflect genuine uncertainty rather than the false precision of frequentist confidence intervals.

Adstock Transformation: Memory in Media

When you see a television commercial at 8:00 PM, its effect does not vanish at 8:01 PM. Some fraction of the impression persists -- in memory, in brand salience, in the probability that you will recall the brand when shopping tomorrow or next week. This persistence is called carryover, and the mathematical representation of it is called the adstock transformation.

The simplest adstock model is geometric decay:

Adstock(t)=Spend(t)+λAdstock(t1)\text{Adstock}(t) = \text{Spend}(t) + \lambda \cdot \text{Adstock}(t-1)

Expanding the recursion, the adstock at time tt is a weighted sum of all past spend:

Adstock(t)=k=0λkSpend(tk)\text{Adstock}(t) = \sum_{k=0}^{\infty} \lambda^k \cdot \text{Spend}(t-k)

Where λ\lambda is the decay rate, between 0 and 1. A lambda of 0.7 means that 70% of an impression's effect carries forward to the next period. After one week, 70% remains. After two weeks, 49%. After four weeks, 24%. After eight weeks, 6%.

Different channels have different decay rates, and these differences carry strategic implications.

Adstock Decay Curves by Channel (Retention of Effect Over 8 Weeks)

Paid search has near-zero carryover. The intent exists at the moment of the query. If you do not capture it then, it is gone. Television and podcast advertising show long tails -- brand impressions persist for weeks. Out-of-home (OOH) advertising sits in between, with effects that decay over days to weeks depending on exposure frequency.

The implication for measurement is critical. A model that ignores adstock will systematically overvalue channels with immediate effects (paid search, retargeting) and undervalue channels with delayed effects (television, podcast, brand campaigns). This is exactly the same bias that plagued multi-touch attribution -- and it happens in MMM too, if the adstock transformation is omitted or mis-specified.

Here is a Python implementation of adstock and Hill saturation transformations for use in an MMM pipeline:

import numpy as np
 
def geometric_adstock(spend: np.ndarray, decay: float) -> np.ndarray:
    """Apply geometric adstock transformation.
 
    Args:
        spend: Array of weekly channel spend values.
        decay: Carryover rate (lambda), between 0 and 1.
    Returns:
        Adstocked spend series.
    """
    adstocked = np.zeros_like(spend, dtype=float)
    adstocked[0] = spend[0]
    for t in range(1, len(spend)):
        adstocked[t] = spend[t] + decay * adstocked[t - 1]
    return adstocked
 
def hill_saturation(x: np.ndarray, K: float, gamma: float) -> np.ndarray:
    """Apply Hill saturation function.
 
    Args:
        x: Input spend (post-adstock).
        K: Half-saturation point.
        gamma: Shape parameter controlling steepness.
    Returns:
        Saturated response between 0 and 1.
    """
    return x**gamma / (K**gamma + x**gamma)
 
# Example: TV spend over 52 weeks
tv_spend = np.random.exponential(scale=50_000, size=52)
tv_adstocked = geometric_adstock(tv_spend, decay=0.75)
tv_saturated = hill_saturation(tv_adstocked, K=100_000, gamma=1.5)

In a Bayesian MMM, the decay parameter lambda is not fixed. It is estimated from the data, with a prior distribution reflecting what we know about each channel's typical carryover behavior. The posterior distribution of lambda tells us how confident we should be in our estimate of each channel's memory.

Saturation Curves: The Lie of Linear Returns

The second critical transformation in modern MMM is the saturation function. It captures a fact that every marketer intuits but most attribution models ignore: diminishing returns.

Your first $10,000 of Facebook spend reaches new audiences, generates fresh impressions, drives real conversions. Your 500th $10,000 of Facebook spend bombards the same exhausted audience with creative they have seen seventeen times. The incremental return on the 500th unit is a fraction of the first.

The standard saturation function is the Hill function (borrowed from biochemistry, where it describes dose-response relationships):

Saturation(x)=xγKγ+xγ\text{Saturation}(x) = \frac{x^{\gamma}}{K^{\gamma} + x^{\gamma}}

where KK is the half-saturation point (the spend level at which 50% of maximum effect is reached) and γ\gamma controls the steepness. An alternative parameterization uses exponential saturation:

Saturation(x)=1exp(αxγ)\text{Saturation}(x) = 1 - \exp(-\alpha \cdot x^{\gamma})

Where α\alpha controls the speed of saturation and γ\gamma controls the shape. When gamma = 1, the curve is a simple exponential approach to saturation. When gamma > 1, the curve has an S-shape -- slow initial response, rapid growth, then saturation. The S-shape is common for channels that require a threshold of awareness before driving action.

Saturation Curves by Channel: Incremental Revenue per $1,000 Spend

The chart reveals a pattern that upends conventional digital marketing wisdom. Paid search and social media saturate early. Their marginal returns decline steeply after modest spend levels. Television and podcast, by contrast, have flatter saturation curves -- their returns decline more gradually because they access larger, less-targeted audiences with lower frequency caps.

This means the optimal budget allocation depends entirely on total budget size. A company spending $100K per month might correctly allocate 70% to digital channels. The same company spending $2M per month might correctly allocate 50% or more to television and audio -- not because those channels became better, but because digital channels hit their ceiling.

Prior Specification: The Art of Informed Ignorance

Here is where Bayesian MMM separates from both classical MMM and black-box machine learning. The analyst must specify prior distributions for every parameter in the model. This is simultaneously the method's greatest strength and its most dangerous pitfall.

A prior distribution encodes what you believe about a parameter before seeing the data. If you believe television advertising has a positive effect on revenue (a reasonable belief, given a century of evidence), you can specify a prior that is concentrated on positive values. The data then updates this prior to produce a posterior distribution. If the data strongly disagrees with the prior, the posterior will shift away from it. If the data is ambiguous, the prior provides stabilization.

Why does this matter? Because marketing data is noisy. Weekly time-series data provides, at best, 104-156 observations for a two-to-three-year modeling window. With 8-12 marketing channels plus trend, seasonality, and control variables, you are estimating 30-50 parameters from roughly 100 data points. Without priors, the model is underdetermined. With priors, the model is regularized -- constrained to produce estimates that are consistent with both the data and reasonable prior beliefs.

The danger is obvious. If your priors are wrong, your posteriors will be wrong. An analyst who specifies a strong prior that "television is highly effective" will produce a model that confirms television is highly effective, even if the data suggests otherwise. This is not a flaw in Bayesian inference. It is a flaw in the analyst.

Common Prior Specifications in Bayesian MMM

ParameterTypical Prior DistributionRationaleRisk of Misspecification
Channel coefficient (beta)Half-Normal(0, sigma)Marketing should not decrease sales; constrains to positive effectsMay mask genuinely negative ROI channels (e.g., bad creative)
Adstock decay (lambda)Beta(3, 3) per channelCenters at 0.5 with moderate uncertainty; updated by dataIf true decay is very fast or very slow, prior may dominate with limited data
Saturation alphaGamma(1, 1)Weakly informative; allows data to determine saturation speedLow risk; wide prior lets data speak
Saturation gammaBeta(2, 2)Centers at moderate S-shape; allows linear to highly concaveOverly tight priors here can force incorrect curve shapes
Seasonal amplitudeNormal(0, 1)Allows both positive and negative seasonal effectsLow risk for weekly data with 2+ years of history
Trend slopeNormal(0, 0.1)Assumes slow-moving trend; penalizes wild fluctuationsMay smooth over genuine structural breaks (e.g., COVID, product launches)

The discipline of prior specification forces the analyst to be explicit about assumptions. In classical MMM, assumptions are hidden in modeling choices that are rarely documented. In Bayesian MMM, assumptions are parameters that you must name, justify, and subject to sensitivity analysis. This transparency is a feature, not a burden.

Confounders: Everything That Can Lie to Your Model

A confounder is a variable that affects both marketing spend and business outcomes, creating a spurious correlation that the model mistakes for a causal effect. Confounders are the primary reason MMMs produce wrong answers, and managing them is the most important and least glamorous part of the work.

The most dangerous confounders:

Seasonality. You spend more on marketing during Q4 because it is the holiday season. Revenue is also higher during Q4 because it is the holiday season. If the model does not properly account for seasonal patterns, it will attribute the seasonal revenue lift to the increased marketing spend. This inflates estimated marketing ROI.

Competitive activity. Your competitor launches a product. Your sales decline. You respond by increasing advertising. If the model does not include a control for competitive activity, it will see the combination of higher spend and lower sales and conclude your advertising is ineffective. The opposite error is also possible: your competitor cuts their budget, your sales increase for reasons unrelated to your marketing, and the model gives your campaigns credit.

Promotions and pricing. Running a promotion simultaneously with a media campaign -- which happens constantly -- makes it nearly impossible to disentangle the media effect from the promotional effect without careful modeling.

Macroeconomic conditions. Consumer confidence, unemployment rates, interest rates -- these affect both marketing budgets (companies spend more when confident) and consumer spending (consumers buy more when confident). The correlation between marketing spend and revenue may reflect shared sensitivity to economic conditions rather than a causal relationship.

Product changes. A new feature launch, a viral moment, a PR crisis -- these affect outcomes but are rarely included in the MMM input data. The model absorbs their effects into whatever marketing variables happen to coincide temporally.

The Bayesian framework helps but does not solve this problem. Priors can prevent implausible estimates (a channel with negative ROI when you know it works). Posterior predictive checks can reveal model misfit. But no statistical technique can correct for a variable that is not in the model. This is a data engineering problem, not a modeling problem.

Google's CausalImpact: The Counterfactual Machine

In 2015, Kay Brodersen and colleagues at Google published a paper that changed the MMM landscape. CausalImpact uses BSTS to estimate the causal effect of an intervention -- a campaign launch, a market entry, a policy change -- by constructing a synthetic counterfactual.

The logic is elegant. Before the intervention, you observe the relationship between your target time series (say, revenue in a test market) and a set of control time series (revenue in markets where no intervention occurred). The BSTS model learns this relationship. After the intervention, the model projects what the target time series would have been, had the intervention not occurred. The difference between the actual observed values and this projected counterfactual is the estimated causal effect.

This is fundamentally different from standard regression-based MMM. Standard MMM estimates average effects across the entire time series. CausalImpact estimates the effect of a specific, discrete event. It answers questions like "What was the incremental impact of launching our TV campaign in February?" rather than "What is the average marginal return of television advertising?"

The method requires two conditions. First, you need control time series that are correlated with the target but not affected by the intervention. In practice, this means geographic controls (markets where you did not run the campaign) or temporal controls (pre-intervention periods). Second, the pre-intervention relationship must be stable enough to project forward.

When these conditions hold, CausalImpact produces remarkably credible estimates. Google uses it internally for measuring the impact of product launches, pricing changes, and marketing campaigns. The open-source R package has been cited in over 1,500 academic papers.

When the conditions do not hold -- when there are no clean controls, when the pre-intervention relationship is unstable, when the intervention is gradual rather than discrete -- CausalImpact produces estimates that are precise but inaccurate. The Bayesian credible intervals look tight. The answer looks definitive. But it is definitively wrong, because the counterfactual is built on a foundation that does not hold.

Robyn vs. Meridian: The Open-Source MMM Wars

Two open-source MMM tools now dominate the market: Meta's Robyn (released 2021) and Google's Meridian (released 2024). They represent fundamentally different philosophies, and choosing between them is a decision about your measurement worldview.

Meta Robyn vs. Google Meridian: Architecture Comparison

DimensionMeta RobynGoogle Meridian
Statistical frameworkFrequentist optimization (Nevergrad)Fully Bayesian (MCMC via JAX/NumPyro)
EstimationPoint estimates via gradient-free optimizationPosterior distributions via Hamiltonian Monte Carlo
Uncertainty quantificationPareto-optimal model selection from many runsNative credible intervals on all parameters
Adstock modelGeometric and Weibull decayGeometric decay with hierarchical priors
Saturation modelHill functionHill function with reach/frequency integration
Prior specificationRidge regression hyperparametersExplicit Bayesian priors on all parameters
SpeedFast (minutes per model)Slower (hours for full MCMC)
CalibrationSupports lift test calibrationSupports lift test and geo-experiment calibration
LanguageR with Python wrapperPython (JAX)
Organizational biasMay favor Meta channels if uncalibratedMay favor Google channels if uncalibrated
Best forTeams needing fast iteration, many scenariosTeams needing rigorous uncertainty quantification

Robyn's approach is pragmatic. Run the optimizer thousands of times. Collect the Pareto-optimal solutions -- models that balance fit and parsimony. Present the analyst with a set of plausible models to choose from. This is fast, scalable, and produces reasonable answers without requiring deep Bayesian expertise.

The limitation is philosophical. Robyn does not produce genuine posterior distributions. It produces a set of point estimates from different optimization runs. The "uncertainty" comes from the spread across Pareto-optimal models, not from a principled probabilistic framework. This makes it harder to answer questions like "What is the probability that reallocating $100K from search to TV will improve ROI?" Robyn can show you what different models suggest. It cannot give you a probability.

Meridian is rigorous. Full Bayesian inference via Hamiltonian Monte Carlo. Explicit priors. Genuine posterior distributions. Native integration with Google's geo-experiment framework for calibration. It produces answers that are statistically defensible in ways that Robyn's are not.

The limitation is practical. MCMC is slow. Prior specification requires expertise. The model is less forgiving of poor data quality. And -- a point that deserves emphasis -- Meridian's default priors and data integration are designed by Google engineers who work for a company that sells advertising. The same caveat applies to Robyn and Meta.

In practice, the choice often comes down to team capability. If your data science team has strong Bayesian skills and can invest weeks in model development, Meridian produces more defensible results. If your team needs to iterate quickly and communicate results to non-technical stakeholders, Robyn's scenario-based approach is more accessible.

The best teams use both. Robyn for rapid exploration. Meridian for final estimates. Disagreements between the two highlight areas of genuine uncertainty that deserve experimental validation.

Frequency Decomposition: Peeling Apart the Signal

One of the most powerful and underused techniques in modern MMM is frequency decomposition -- the practice of separating a time series into components that operate at different frequencies.

Marketing effects operate at different timescales. A paid search campaign operates at daily or hourly frequency -- spend today, conversions today. A television branding campaign operates at weekly or monthly frequency -- build awareness this month, harvest conversions next quarter. Seasonal patterns operate at annual frequency.

Fourier analysis or wavelet decomposition can separate these frequencies. The payoff is dramatic. Instead of asking "Does TV drive revenue?" (a question that mixes all frequencies), you can ask "Does TV spending at the 4-8 week frequency correlate with revenue at the 4-8 week frequency?" This frequency-specific analysis strips away confounders that operate at other timescales.

For example, both TV spending and revenue might show strong annual seasonality. A naive regression would capture this shared seasonality as a "TV effect." Frequency decomposition separates the annual component (which is a confounder) from the medium-frequency component (which is more likely to reflect genuine marketing impact).

The mathematics are not complex. A discrete Fourier transform converts a time-domain signal into a frequency-domain representation. Filtering specific frequency bands and then transforming back produces time-domain signals that contain only the frequencies of interest. Regression on these filtered signals produces cleaner causal estimates.

This technique is standard in climate science, signal processing, and macroeconomics. It is rarely used in marketing analytics. The opportunity for teams willing to apply it is substantial.

Validation: How to Know When Your Model Is Wrong

An MMM without validation is an expensive opinion. The field has converged on four validation approaches, listed in order of increasing rigor and cost.

1. In-sample fit metrics. MAPE (Mean Absolute Percentage Error), R-squared, DIC (Deviance Information Criterion) for Bayesian models. These tell you how well the model fits the data it was trained on. They are necessary but insufficient -- a model can perfectly fit historical data and produce useless forecasts.

2. Out-of-time validation. Hold out the most recent 8-12 weeks of data. Train the model on everything prior. Predict the holdout period. Compare predictions to actuals. This tests the model's ability to forecast, which is a much harder bar than fitting historical data. A model that cannot predict the next quarter's revenue trajectory has no business informing budget allocation.

3. Lift test calibration. Run randomized controlled experiments -- geographic holdout tests, randomized budget pauses, incrementality tests -- and compare the experimental estimates to the model's estimates for the same intervention. If the model says pausing Facebook spend should reduce revenue by $200K and the geo-test shows a $190K reduction, the model is well-calibrated. If the model says $200K and the test shows $50K, the model is wrong and must be corrected.

4. Posterior predictive checks. Simulate data from the fitted model's posterior distribution. Compare the simulated data's statistical properties (mean, variance, autocorrelation, distribution shape) to the actual data. If the model cannot generate data that looks like reality, its assumptions are wrong.

Model Validation Hierarchy: Cost vs. Confidence

Lift test calibration is expensive. A proper geographic holdout test requires suppressing advertising in randomly selected markets for 4-8 weeks, forgoing revenue to measure incrementality. Most companies run one or two per year. But these experiments are the only ground truth available for MMM validation. Without them, you are trusting a model that has never been checked against reality.

The cadence matters. Run incrementality experiments continuously. Use the results to calibrate the MMM. Use the MMM to plan the next round of experiments. This feedback loop between experimentation and modeling is the state of the art, and forms the core of what we describe as a unified measurement architecture connecting MMM, MTA, and experimentation. Companies that operate it -- typically the largest advertisers with dedicated measurement science teams -- consistently outperform those that rely on either experiments or models alone.

When MMM Fails

MMM is not a universal solution. It fails predictably under specific conditions, and pretending otherwise is malpractice.

Small budgets. MMM requires variance in spend to estimate effects. If you spend $5,000 per week on each of four channels with minimal week-to-week variation, the model cannot distinguish channel effects from noise. As a rough heuristic, you need at least 2x variation in weekly spend (measured by coefficient of variation) for the model to produce meaningful estimates. Most companies below $500K in annual media spend lack sufficient variance.

Few channels. With two or three marketing channels, the model has very few degrees of freedom. Every confounder becomes a larger problem because there are fewer channels to help identify the model. Five channels is a practical minimum for robust estimation.

Highly correlated spend patterns. If you always increase all channels simultaneously (common during product launches and holiday seasons) and decrease all channels simultaneously (common during budget cuts), the model cannot disentangle their individual effects. Multicollinearity is the technical term. The practical solution is deliberate variation -- intentionally varying channel spend asynchronously, even if it feels suboptimal in the short term, to generate the data your model needs.

Short time horizons. BSTS models need 2-3 years of weekly data to reliably estimate seasonal components. With less than 18 months of data, the model is unreliable. Startups and companies that recently changed their marketing strategy dramatically are poor candidates for MMM.

Rapidly changing businesses. MMM assumes that the relationship between spend and outcomes is relatively stable over the modeling window. If your product, pricing, competitive landscape, or target audience changed substantially during the window, the model's estimates reflect an average of conditions that no longer exist. This is the stationarity assumption, and violating it is the most common reason MMMs produce misleading results.

Offline-heavy conversion. If the outcome variable (e.g., in-store purchases) is measured with significant lag or error, the time-series alignment between inputs and outputs breaks down. This was less of a problem in the CPG era (when scanner data provided clean weekly sales) than it is for businesses with long, opaque conversion paths.

Implementation Roadmap for Mid-Market Companies

For companies spending 1M1M-20M annually on marketing with 4-8 channels, here is a practical implementation roadmap. This is not theory. This is the sequence of decisions and investments that produces a functioning MMM program within six months.

Month 1: Data Audit and Assembly

The most common reason MMM projects fail is not modeling -- it is data. You need clean, weekly, channel-level records of spend, impressions (or GRPs), and any available engagement metrics for every marketing channel, going back at least two years. You also need weekly revenue or conversion data at the same granularity.

Build the data pipeline first. Automate the extraction from every advertising platform (Google Ads, Meta Ads, programmatic DSPs, direct buys), your CRM, your web analytics, and your finance system. Store it in a single, versioned dataset. This pipeline is your most valuable long-term asset -- more valuable than any single model.

Include control variables: promotional calendar, pricing changes, competitive activity (share of voice or ad intelligence data), macroeconomic indicators (consumer confidence, unemployment), weather if relevant, and any significant business events (product launches, outages, PR incidents).

Month 2: Baseline Model

Start with Robyn. It is faster to iterate and more forgiving of imperfect data. Build a baseline model with all channels, standard adstock and saturation transformations, and default hyperparameters. The goal is not a final answer. The goal is a first conversation. Show the results to marketing leadership. Ask: "Does this match your intuition? Where does the model surprise you?"

Surprises are the most valuable output of a first model. If the model says email marketing drives 30% of revenue and your CMO knows email is a retention channel, not an acquisition channel, that tells you something is wrong with the model -- possibly a confounder, possibly a data error, possibly a prior that needs adjustment.

Month 3: Refinement and Priors

Incorporate domain knowledge. Adjust priors based on the baseline model's surprises and your team's expertise. Add control variables that the baseline model missed. Test alternative adstock and saturation specifications. If you have the Bayesian expertise, build a parallel model in Meridian and compare results.

Month 4: Calibration Experiment Design

Design your first incrementality experiment. Pick the channel with the largest estimated effect and the highest strategic importance. Design a geographic holdout test: suppress that channel in randomly selected markets for 4-6 weeks while maintaining it in control markets. This experiment will either validate or invalidate your model's estimate for that channel.

Month 5: Run Experiment and Refine

Execute the holdout test. Continue refining the model with any new data. Begin building the budget optimization layer -- the tool that takes the model's channel-level ROI curves and recommends optimal budget allocation under constraints.

Month 6: Calibrate and Operationalize

Incorporate the experiment results. If the model's estimate for the tested channel was within 30% of the experimental result, you have reasonable confidence in the model. If not, diagnose why and adjust. Build a quarterly refresh cycle: update data, refit the model, run one incrementality experiment, recalibrate. Repeat indefinitely.

Implementation Timeline: 6-Month MMM Program Build

MonthPrimary ActivityKey DeliverableEstimated Investment
Month 1Data audit and pipeline constructionClean, automated weekly dataset (2+ years)$15K-40K (engineering time or vendor)
Month 2Baseline model (Robyn)First channel-level contribution estimates$10K-25K (analyst time or consultant)
Month 3Model refinement and prior calibrationCalibrated model with domain-informed priors$10K-20K (analyst time)
Month 4Incrementality experiment designGeo-holdout test protocol for top channel$5K-10K (design and coordination)
Month 5Experiment execution + optimization layerRunning experiment; budget optimizer prototype$20K-80K (foregone revenue in holdout markets)
Month 6Calibration and operationalizationValidated model with quarterly refresh process$10K-15K (analysis and documentation)

Total investment for a mid-market company: 70K70K-190K over six months, including the opportunity cost of the holdout experiment. This is roughly the cost of one senior marketing hire. The model, if well-built and maintained, will influence the allocation of millions in annual spend. The ROI on measurement infrastructure is almost always the highest-returning investment a marketing organization can make.

The Uncomfortable Return of Aggregate Thinking

The digital marketing industry spent two decades building a cult of the individual. Individual user journeys. Individual touchpoint attribution. Individual-level targeting and personalization. The entire adtech ecosystem was architected around the belief that understanding individual behavior was the path to marketing effectiveness.

MMM asks you to abandon that belief. Not because individual behavior does not matter, but because you can no longer observe it -- and even when you could, the observation was often more misleading than informative.

Multi-touch attribution told you that the last-click channel was the hero. MMM tells you that the awareness channel you could never attribute was doing the real work. MTA said "cut the unmeasurable channels -- they aren't producing clicks." MMM says "those unmeasurable channels were producing the demand that the measurable channels harvested."

This is uncomfortable for organizations built on MTA. It restructures power. The paid search team that justified its budget with definitive click-to-conversion paths now competes on equal footing with the brand team that could never prove anything. The analytics team that built dashboards around user-level funnels must learn time-series econometrics. The CMO who reported "we drove 47,000 attributed conversions this month" must now say "our model estimates that marketing contributed 35-45% of revenue this quarter, with the following distribution across channels."

That second statement is less satisfying. It is also closer to the truth.

The privacy transition is forcing marketing measurement to grow up. User-level attribution was measurement's adolescence -- confident, precise, and often wrong. Bayesian MMM is its maturity -- humble, probabilistic, and disciplined by experimentation.

The 1960s statisticians did not have the computational power to do what we can do today. They did not have MCMC sampling, hierarchical priors, or open-source probabilistic programming frameworks. But they had the right instinct: measure what you can observe (aggregate spend, aggregate outcomes), be honest about what you cannot observe (individual causal pathways), and validate your models against controlled experiments.

That instinct, it turns out, was not primitive. It was ahead of its time.

References

  • Borden, N. H. (1964). The concept of the marketing mix. Journal of Advertising Research, 4(2), 2-7.

  • Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N., & Scott, S. L. (2015). Inferring causal impact using Bayesian structural time-series models. The Annals of Applied Statistics, 9(1), 247-274.

  • Chan, D., & Perry, M. (2017). Challenges and opportunities in media mix modeling. Google Research Technical Report.

  • De Jong, P., & Penzer, J. (1998). Diagnosing shocks in time series. Journal of the American Statistical Association, 93(442), 796-806.

  • Durbin, J., & Koopman, S. J. (2012). Time Series Analysis by State Space Methods (2nd ed.). Oxford University Press.

  • Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press.

  • Jin, Y., Wang, Y., Sun, Y., Chan, D., & Koehler, J. (2017). Bayesian methods for media mix modeling with carryover and shape effects. Google Research Technical Report.

  • Lopes, H. F., & West, M. (2004). Bayesian model assessment in factor analysis. Statistica Sinica, 14(1), 41-67.

  • Meta Open Source. (2022). Robyn: Continuous & semi-automated MMM built with ridge regression and evolutionary optimization. GitHub Repository.

  • Google. (2024). Meridian: An open-source Bayesian Marketing Mix Model. GitHub Repository.

  • Scott, S. L., & Varian, H. R. (2014). Predicting the present with Bayesian structural time series. International Journal of Mathematical Modelling and Numerical Optimisation, 5(1-2), 4-23.

  • Simester, D. I., Hu, Y., Brynjolfsson, E., & Anderson, E. T. (2020). Advertising effectiveness measurement: Intermediary ad networks and their incentives. Marketing Science, 39(2), 268-287.

Read Next