Unified Measurement Architecture: Connecting MMM, MTA, and Experimentation Into a Single Source of Truth

TL;DR: MMM says Facebook drives 34% of revenue, MTA says Google drives 38%, and incrementality tests say both are delivering roughly half what either model claims. Unifying these three measurement systems -- with experiments as the calibration anchor that keeps MMM and MTA honest -- resolves contradictions that cause companies spending $10M+ on marketing to misallocate millions per quarter based on whichever model supports the strategy someone already believes in.

The Three Kingdoms Problem

Your Marketing Mix Model says Facebook drives 34% of incremental revenue. Your Multi-Touch Attribution platform says Google paid search drives 38%. Your latest incrementality test -- a properly designed geo-experiment -- says both channels are delivering roughly half the value that either model claims.

Three measurement systems. Three different answers. One budget to allocate.

This is the default state of measurement at any company spending more than $10 million annually on marketing. The three kingdoms -- Media Mix Modeling, Multi-Touch Attribution, and Experimentation -- have evolved independently, built by different teams, governed by different assumptions, and optimized for different time horizons. They agree on almost nothing.

The consequences are not academic. When these systems disagree, organizations default to politics. The team that owns the most favorable model wins the budget argument. The CMO picks whichever number supports the strategy they already believe in. And the company systematically misallocates millions of dollars per quarter because nobody has built a coherent framework for reconciling contradictory measurement signals.

The industry has known about this problem for over a decade. What has changed in the last three years is that we finally have the statistical machinery and the organizational playbooks to fix it. Not by picking one method and ignoring the others. Not by averaging the numbers and hoping. By building a unified measurement architecture where each method plays a specific, calibrated role -- and where experiments serve as the anchor that keeps everything honest.

This is how to build that architecture.

Why Each Method Alone Is Insufficient

Before we can unify, we need to be precise about what each method does well and what it gets wrong.

Media Mix Modeling (MMM)

MMM is an econometric approach that uses aggregate time-series data -- weekly or monthly spend by channel, along with external variables like seasonality, macroeconomic indicators, and competitive activity -- to estimate the marginal contribution of each channel to an outcome variable (revenue, conversions, signups).

The approach dates back to the 1960s, when consumer packaged goods companies needed to measure the impact of television and print advertising without any user-level tracking. It works by observing statistical relationships between spend variation and outcome variation over time.

What it does well. MMM captures offline channels that digital attribution cannot see. It accounts for long-term brand effects and saturation curves. It is privacy-compliant by design because it operates on aggregate data -- no cookies, no device graphs, no user-level tracking. In a post-iOS 14.5 world where digital attribution has lost 30-60% of its signal, Bayesian structural time series MMM has experienced a justified resurgence.

Where it fails. MMM requires substantial historical data -- typically 2-3 years of weekly observations per channel -- to produce stable coefficient estimates. It struggles with channels that have low spend variation (if you spend $50,000 on podcasts every week for two years, the model cannot isolate the podcast effect). It operates at a granularity that is too coarse for tactical optimization. And its most fundamental limitation is that correlation in time-series data is not causation. If you always increase Facebook spend during Q4 because that is when demand is highest, MMM may attribute the Q4 revenue lift to Facebook rather than to seasonal demand.

Multi-Touch Attribution (MTA)

MTA uses user-level event data -- ad impressions, clicks, site visits, conversions -- to assign credit to individual touchpoints along the customer journey. Models range from simple heuristics (last-click, first-click, linear) to algorithmic approaches (Shapley value, Markov chain, data-driven attribution in Google Ads).

What it does well. MTA operates at the user level, which means it can optimize in near-real-time. It captures the customer journey in high resolution. It enables tactical decisions: which keywords to bid on, which creative to rotate, which audience segments to target. For digital-native businesses where the entire funnel is observable online, MTA provides granularity that MMM cannot match.

Where it fails. MTA only sees what it can track. It is blind to offline touchpoints, blind to view-through effects beyond short attribution windows, and increasingly blind to cross-device journeys as privacy restrictions tighten. More critically, MTA is structurally biased toward lower-funnel channels. A Google brand search ad that captures demand created by a podcast ad gets full credit in last-click models and disproportionate credit in most algorithmic models. MTA answers "which touchpoints preceded conversion?" -- a description question -- while pretending to answer "which touchpoints caused conversion?" -- a causal question. These are profoundly different questions, and the causal inference limitations of multi-touch attribution mean the gap between them is where billions of dollars are misallocated.

Experimentation (Incrementality Testing)

Experimentation -- typically geo-experiments or randomized holdout tests -- directly measures the causal incremental impact of a channel by comparing outcomes in exposed and unexposed groups. A geo-lift experiment might withhold Facebook advertising in 30 randomly selected DMAs for four weeks while maintaining it in 30 matched DMAs, then measure the revenue difference.

What it does well. Experimentation is the only method that produces causal estimates of incremental value. It is the gold standard for a reason: randomized controlled trials are the gold standard in medicine, in policy evaluation, and in product development. There is no methodological dispute about whether a well-executed experiment provides valid causal inference. It does.

Where it fails. Experiments are expensive and disruptive. You must actually turn off or reduce spending in a channel to measure its incremental value, which means forgoing revenue during the test. They measure one channel at a time (or a small number), making it impractical to test every channel every quarter. They capture the incremental value of a channel at the specific spend level tested, not the full response curve. And they require sufficient scale -- small channels or small markets produce underpowered tests with wide confidence intervals that don't help anyone make decisions.

The Strengths and Weaknesses Matrix

The following matrix crystallizes why no single method suffices and where each method needs the others.

Measurement Method Comparison Matrix

Dimension	MMM	MTA	Experimentation
Causal validity	Low: correlational	Low: correlational	High: randomized
Granularity	Channel/week	User/touchpoint	Channel or tactic
Time horizon	Long-term (years)	Short-term (days/weeks)	Point-in-time
Offline channels	Yes	No	Yes (geo-experiments)
Privacy compliance	High (aggregate data)	Low (user-level tracking)	High (aggregate comparison)
Cost to implement	Medium	High (data infrastructure)	High (opportunity cost)
Speed of insight	Slow (monthly/quarterly)	Fast (near real-time)	Slow (4-8 week tests)
Channel coverage	All channels simultaneously	Digital channels only	One channel at a time
Saturation curves	Yes	No	Partial (at tested spend level)
Cross-channel interaction	Limited	Limited	No (single-channel tests)
Signal post-iOS 14.5	Stable	Degraded 30-60%	Stable

Read the matrix column by column and the problem is clear. MMM covers all channels and captures saturation but lacks causal validity and tactical granularity. MTA provides granularity and speed but is blind to offline, biased toward lower-funnel, and degraded by privacy changes. Experimentation provides causal truth but is slow, expensive, and narrow in scope.

No column is complete. Every column has critical gaps that another column fills. The unified architecture is not about choosing one -- it is about building the connective tissue between all three.

Experiments as Ground Truth: The Measurement Hierarchy

The first principle of a unified measurement architecture is establishing a hierarchy of evidence. Not all measurement signals are created equal, and treating them as interchangeable inputs leads to incoherent outputs.

The hierarchy is straightforward:

Loading diagram...

Level 1: Randomized experiments. These produce causal estimates with known uncertainty bounds. They are the anchor.

Level 2: Quasi-experiments. Natural experiments, regression discontinuity designs, and instrumented variable approaches. Weaker than randomized experiments but still designed to isolate causal effects.

Level 3: Calibrated models. MMM and algorithmic attribution models that have been calibrated against Level 1 or Level 2 evidence. They extend experimental findings to channels and time periods where experiments have not been run.

Level 4: Uncalibrated models. Default MTA platforms, heuristic attribution, and uncalibrated MMM. These provide directional signals but should never be treated as ground truth for budget allocation.

Measurement Hierarchy: Confidence in Causal Estimates by Method

The practical implication of this hierarchy: when an experiment contradicts your MMM or MTA, the experiment wins. Full stop. The model gets recalibrated. Not the other way around.

This sounds obvious but it is organizationally painful. The analytics team that spent six months building an MMM does not want to hear that their model overstates Facebook's contribution by 40%. The performance marketing team whose bonuses are tied to MTA-reported ROAS does not want to hear that their channel attribution is structurally biased. Establishing the measurement hierarchy is as much a political act as a technical one. It requires executive sponsorship and a shared commitment to causal truth over convenient narrative.

Bayesian Calibration: Using Experiments as Priors for MMM

The most powerful technical mechanism for unifying measurement is Bayesian calibration -- using experimental results as informative priors in the MMM estimation process. The same Bayesian reasoning that transforms A/B testing from a hypothesis-testing ritual into a decision-making framework applies here at the portfolio level.

Traditional MMM is frequentist. It estimates channel coefficients from the data alone, with flat or weakly informative priors. This means the model can produce any estimate that the time-series correlation structure supports, including estimates that are wildly inconsistent with experimental evidence.

Bayesian MMM changes this. Instead of letting the model estimate channel effects from scratch, you encode experimental results as prior distributions on the channel coefficients. If a geo-experiment found that Facebook's incremental CPA is $42 with a standard error of $8, you set a normal prior centered at $42 with a standard deviation of $8 on the Facebook coefficient:

\beta_{\text{channel}} \sim \mathcal{N}(\mu_{\text{exp}}, \sigma_{\text{exp}}^2)

where $\mu_{\text{exp}}$ is the experimental point estimate and $\sigma_{\text{exp}}$ is the experimental standard error. The model is then free to update this prior with the time-series data -- but it cannot stray far from the experimental evidence without strong contradictory signal.

The mechanics work as follows:

Step 1. Run incrementality experiments on your largest channels. Start with the top 3-5 channels by spend. These experiments produce point estimates and confidence intervals for each channel's incremental contribution.

Step 2. Encode experimental results as priors. For each experimentally measured channel, set a Gaussian prior on the MMM coefficient: mean = experimental point estimate, standard deviation = experimental standard error. For channels without experimental evidence, use weakly informative priors.

Step 3. Fit the Bayesian MMM. The posterior distribution on each channel coefficient now reflects both the experimental evidence and the time-series data. Where they agree, the posterior narrows (higher confidence). Where they disagree, the posterior pulls toward the experimental evidence (because the prior is informative).

Step 4. Inspect the tension. If the time-series data strongly disagrees with the experimental prior for a channel, the posterior will be wide and the Bayes factor will indicate tension. This is a diagnostic signal: either the experiment was run during an atypical period, or the MMM specification is wrong (perhaps omitting a confounding variable). Both are worth investigating.

Facebook CPA Estimates: Uncalibrated MMM vs Experiment vs Bayesian MMM

Notice the pattern in this (representative) chart. The uncalibrated MMM and both MTA variants substantially understate Facebook's CPA -- they overvalue the channel. The geo-experiment reveals the true incremental cost. The Bayesian MMM, calibrated with the experimental prior, produces an estimate close to the experimental truth while also incorporating the time-series dynamics that the experiment alone cannot capture (saturation effects, lagged responses, seasonal interaction).

This is the core value proposition of Bayesian calibration: you get experimental accuracy plus the modeling benefits of MMM (full response curves, cross-channel coverage, forward projections) in a single framework.

Meta's open-source Robyn library and Google's LightweightMMM (now Meridian) both support this calibration approach. The technical barrier to implementation has dropped substantially. The organizational and data infrastructure barriers remain significant, which we address below.

Google's Triangulation Methodology

Google's internal measurement science team has published extensively on what they call "triangulation" -- the practice of comparing and reconciling estimates from multiple measurement methods. Their approach, documented across several peer-reviewed papers and industry presentations, codifies several principles worth adopting.

Principle 1: No single method gets a veto. Even experiments have limitations (finite test duration, specific market conditions, potential contamination). Triangulation treats each method as providing a noisy signal about the true causal effect, with experiments receiving the highest weight but not infinite weight.

Principle 2: Disagreement is information. When MMM and experimentation disagree on a channel's value, the disagreement itself is diagnostic. It can reveal model misspecification, confounded time-series, or experimental design flaws. Rather than ignoring disagreement, triangulation treats it as the most valuable output of the system.

Principle 3: The reconciled estimate should have a wider confidence interval than any individual estimate. This is counterintuitive but critical. If your experiment says Facebook CPA is $42 +/-$ 8 and your MMM says $28 +/-$ 5, the reconciled estimate should not simply split the difference at $35 +/-$ 6.50. The very fact of disagreement implies additional uncertainty. The correct reconciled estimate should reflect that model uncertainty, producing something like $38 +/-$ 12.

Principle 4: Triangulation requires a regular cadence. A one-time comparison is a snapshot. A quarterly triangulation cycle -- where experimental evidence is compared to model estimates, discrepancies are investigated, and models are recalibrated -- creates a continuously improving measurement system.

Reconciling Aggregate and User-Level Views

One of the deepest tensions in unified measurement is the mismatch between MMM's aggregate view and MTA's user-level view. They operate at different units of analysis, and reconciling them requires a bridging mechanism.

MMM says: "Each additional $1,000 spent on Facebook generates $3,200 in revenue." This is an aggregate marginal effect.

MTA says: "User #47291 saw a Facebook ad, clicked a Google ad, and converted. Google gets 60% credit, Facebook gets 40%." This is a user-level attribution.

The problem is not that these are different numbers. The problem is that they live in different conceptual frames, and marketing teams try to use them interchangeably. The Facebook team quotes the MMM number to argue for more budget. The Google team quotes the MTA number. Both are technically correct within their own frame. Both are misleading when treated as the complete picture.

The bridging mechanism works in two directions:

Top-down calibration. Use MMM (calibrated by experiments) to set channel-level budgets. The MMM provides the total incremental value of each channel at various spend levels. This determines how much to spend on Facebook versus Google versus TV versus podcasts.

Bottom-up optimization. Use MTA (calibrated by MMM channel totals) to optimize within each channel. Once you know the total Facebook budget should be $500,000/month, MTA guides how to allocate that $500,000 across campaigns, audiences, and creatives.

The key constraint: the sum of MTA-attributed conversions within a channel must reconcile to the MMM-estimated incremental conversions for that channel. If MTA attributes 10,000 conversions to Facebook but the calibrated MMM estimates only 6,000 incremental conversions, MTA is overcounting by 40%. Apply a channel-level deflation factor of 0.6 to all Facebook MTA credits. This preserves the relative ranking of campaigns within Facebook (MTA's strength) while correcting the absolute level to match causal reality (MMM's strength, post-calibration).

Reconciliation Example: Facebook Channel Measurement

Metric	MTA (Raw)	MMM (Calibrated)	Reconciled	Adjustment
Attributed conversions	10,000	6,000	6,000	MMM anchor
CPA	$18	$30	$30	MMM anchor
Campaign A share of conversions	45%	N/A	45%	MTA relative ranking
Campaign B share of conversions	35%	N/A	35%	MTA relative ranking
Campaign C share of conversions	20%	N/A	20%	MTA relative ranking
Campaign A incremental conversions	4,500	N/A	2,700	6,000 x 45%
Campaign B incremental conversions	3,500	N/A	2,100	6,000 x 35%
Campaign C incremental conversions	2,000	N/A	1,200	6,000 x 20%

This reconciliation preserves the best of both worlds. MMM (calibrated by experiments) anchors the total. MTA provides the within-channel distribution. Neither system is discarded. Both are constrained by the other.

The Data Pipeline Architecture

A unified measurement architecture is only as strong as the data infrastructure beneath it. The pipeline must feed all three methods from consistent data sources while respecting their different granularity requirements.

The architecture has four layers:

Layer 1: Data Collection. Three parallel ingestion streams. First, aggregate channel spend and impression data, refreshed daily, feeding the MMM. Second, user-level event data (ad exposures, clicks, conversions), feeding MTA. Third, experiment configuration and results data (test/control definitions, outcome metrics, confidence intervals), feeding the calibration engine.

Layer 2: Identity and Aggregation. User-level data is resolved through a probabilistic identity graph (where available) and then aggregated to both user-journey level (for MTA) and channel-week level (for MMM). The critical requirement: the same conversion events must appear in both the MTA input and the MMM input. If these pipelines use different conversion definitions or different attribution windows, reconciliation is impossible before you even start.

Layer 3: Model Estimation. Three parallel model runs. The MMM estimates channel coefficients using Bayesian regression with experimental priors. The MTA model assigns user-level credit using a Shapley or Markov approach. The experiment analysis engine computes incremental lift estimates with confidence intervals.

Layer 4: Reconciliation and Output. The reconciliation engine applies top-down calibration (MMM totals constrain MTA allocations), generates the unified view, and feeds the optimization layer that produces budget recommendations.

Data Freshness Requirements by Measurement Method

The most common pipeline failure mode is not technical. It is definitional. When the MMM team defines a "conversion" as a closed-won deal and the MTA platform counts marketing-qualified leads, and the experimentation team measures revenue, the three systems are literally measuring different outcomes. No amount of statistical sophistication can reconcile numbers that describe different things. The first infrastructure task is establishing a single, shared outcome metric -- typically revenue or a revenue-proxy -- that all three systems measure identically.

Building a Unified Dashboard That Does Not Lie

A unified measurement dashboard must satisfy two seemingly contradictory requirements: it must be simple enough for executives to act on, and it must be honest enough to reflect genuine uncertainty.

Most measurement dashboards fail on the second requirement. They present single-point estimates with no confidence intervals, creating false precision. A dashboard that says "Facebook ROAS: 3.2x" implies a level of certainty that does not exist. A dashboard that says "Facebook ROAS: 2.1x - 4.3x (calibrated), source: MMM + geo-experiment Q3" is honest and actionable.

The dashboard should have three views:

Executive View. Channel-level budget allocation with confidence intervals. Shows the recommended monthly spend per channel, the expected incremental revenue per channel, and the measurement confidence level (high/medium/low based on how recently the channel was experimentally validated). Color-code by confidence: green for experimentally validated within the last two quarters, yellow for model-estimated with stale experimental priors, red for model-estimated with no experimental evidence.

Analyst View. Full triangulation detail. Shows the MMM estimate, MTA estimate, and experimental estimate side by side for each channel, with the reconciled estimate and its derivation logic. Includes the Measurement Discrepancy Register and trend lines showing how estimates have changed over successive calibration cycles.

Tactical View. Within-channel optimization. Shows MTA-derived campaign and audience performance, adjusted by the channel-level calibration factor from the reconciliation layer. This is the view that performance marketers use daily, but with the critical correction that raw MTA numbers have been deflated (or inflated) to match calibrated channel totals.

When to Use Which Method

The unified architecture does not mean using all three methods for every decision. Each method has an optimal decision context.

Use MMM for: Annual and quarterly budget allocation across channels. Long-term strategic planning. Evaluating the impact of macroeconomic factors on marketing effectiveness. Modeling saturation curves and diminishing returns. Any decision that requires a portfolio view across all channels simultaneously.

Use MTA for: Within-channel campaign optimization. Creative performance evaluation. Audience segment prioritization. Real-time bid management. Any decision that requires user-level granularity and near-real-time response.

Use Experimentation for: Validating or invalidating the assumptions of MMM and MTA. Measuring the incremental value of a channel when there is high uncertainty or disagreement between models. Evaluating new channels or major strategy changes where historical data does not exist. Calibrating models at least once per year per major channel.

Use the reconciled estimate for: Budget allocation decisions above $100,000/quarter. Any cross-channel reallocation. Board-level reporting on marketing effectiveness. Marketing team performance evaluation (if you must tie measurement to incentives, tie it to the reconciled number, not to any single model's output).

Decision-Method Mapping

Decision Type	Primary Method	Supporting Method	Update Frequency
Annual channel budget	Calibrated MMM	Experiment results as priors	Quarterly
Quarterly reallocation	Calibrated MMM	Recent experiment results	Quarterly
Campaign-level optimization	Calibrated MTA	MMM channel totals as constraint	Weekly
Creative testing	MTA + in-platform experiments	None required	Continuous
New channel evaluation	Experiment (geo or holdout)	Post-hoc MMM inclusion	Per launch
Incrementality validation	Geo-experiment	Compare to MMM/MTA estimates	2-4x per year per major channel

Scaling Experimentation Cadence

The unified architecture depends on experiments as the calibration anchor. But experiments are expensive and disruptive. The practical question is: how many experiments do you need, and how do you prioritize them?

The answer depends on spend concentration and model uncertainty.

Spend concentration. Most marketing portfolios follow a power law. Three to five channels account for 70-80% of spend. These high-spend channels are where miscalibration causes the most dollar-weighted damage. Prioritize experimental validation for these channels first.

Model uncertainty. When MMM and MTA agree on a channel's value (within 20%), the urgency for experimental validation is lower. When they disagree by more than 30%, the channel should move to the top of the experimentation queue. Disagreement is a signal that at least one model is wrong, and the dollar impact of that error scales with channel spend.

A practical annual cadence for a company spending $50-100 million on marketing:

Q1: Experiment on the highest-spend channel (typically paid social or paid search). Use results to recalibrate MMM.
Q2: Experiment on the channel with the largest MMM-MTA discrepancy. Investigate the cause of disagreement.
Q3: Experiment on a mid-tier channel or a channel where spend is being considered for significant increase. Generate evidence before committing budget.
Q4: Re-test the highest-spend channel if macro conditions have shifted significantly (e.g., post-holiday seasonality, competitive entry). Use all Q1-Q4 results for the annual calibration cycle.

This cadence produces four to six experimental calibration points per year. For companies spending under $20 million annually, two experiments per year (one each half) on the top two channels is a reasonable minimum.

The cost of experimentation is real -- you forgo some revenue in the holdout markets during the test period. But the cost of not experimenting is higher. A 2019 analysis by the Marketing Science Institute estimated that the average Fortune 500 company misallocates 20-30% of its marketing budget due to measurement error. For a company spending $80 million, that is $16-24 million in annual waste. Four geo-experiments costing $500,000 each in foregone revenue are a bargain against that baseline.

Budget Optimization Using Triangulated Estimates

Once the measurement architecture is producing calibrated, reconciled estimates, the final step is using those estimates for budget optimization.

The optimization problem is conceptually simple: allocate the total marketing budget across channels to maximize total incremental revenue (or profit), subject to the constraint that each channel has a diminishing returns curve. Formally, the triangulated estimate for a channel combines evidence from all three methods:

\hat{\theta}_{\text{triangulated}} = \frac{w_{\text{exp}} \cdot \hat{\theta}_{\text{exp}} + w_{\text{MMM}} \cdot \hat{\theta}_{\text{MMM}} + w_{\text{MTA}} \cdot \hat{\theta}_{\text{MTA}}}{w_{\text{exp}} + w_{\text{MMM}} + w_{\text{MTA}}}

where each weight $w_i$ is inversely proportional to the variance of the respective estimate, giving experiments the highest influence.

The calibrated MMM provides the response curves -- the relationship between spend and incremental outcome for each channel. At low spend levels, each additional dollar generates high marginal returns. As spend increases, the channel saturates and marginal returns decline. The optimal allocation equalizes marginal returns across all channels: the last dollar spent on Facebook should generate the same incremental value as the last dollar spent on Google or TV.

Marginal ROAS by Spend Level: Calibrated Response Curves

This chart illustrates a typical pattern: digital channels (Facebook, Google) have high initial ROAS that declines steeply with scale, while traditional channels (linear TV) have lower initial ROAS but flatter curves. At low total budgets, digital dominates. At high total budgets, the optimal mix shifts toward traditional channels because digital has saturated.

The critical nuance: the confidence intervals on these curves matter as much as the point estimates. A channel where the marginal ROAS is estimated at 2.5x +/- 0.3 (recently validated by experiment) should receive a larger allocation than a channel estimated at 3.0x +/- 1.5 (no experimental validation, high model uncertainty). Risk-adjusted optimization penalizes uncertain estimates, which creates a natural incentive to run experiments -- validated channels receive larger allocations.

The optimization should run monthly, using the latest calibrated model. Quarterly, the model itself is recalibrated with any new experimental evidence. Annually, a full triangulation review reassesses the architecture, retires stale priors, and updates the experimentation roadmap.

Organizational Challenges: Who Owns the Truth?

The technical architecture is the easier half of the problem. The organizational architecture is where most unified measurement initiatives die.

In a typical marketing organization, three different teams own the three methods:

The data science or analytics team owns the MMM. They are methodologically sophisticated and think in terms of econometric models and Bayesian inference.
The performance marketing team owns MTA. They live in platform dashboards (Google Ads, Meta Business Suite) and think in terms of ROAS, CPA, and campaign-level optimization.
The growth or experimentation team owns incrementality testing. They are closest to product and engineering and think in terms of statistical power, holdout design, and causal inference.

These teams report to different managers, operate on different timelines, and have different incentive structures. The data science team publishes a quarterly MMM report. The performance team monitors MTA daily. The experimentation team runs two to four tests per year. They rarely compare notes.

The organizational solution has three components:

Component 1: A Measurement Council. A cross-functional group that meets monthly, comprising one representative from each of the three teams plus a senior marketing leader with budget authority. The council's mandate: review the Measurement Discrepancy Register, prioritize the experimentation roadmap, and sign off on the reconciled estimates that inform budget decisions.

Component 2: Shared incentives. If the performance marketing team is incentivized on MTA-reported ROAS, they will resist any calibration that deflates their numbers. If the data science team is incentivized on model accuracy, they will resist experimental evidence that invalidates their model. Align all three teams on a shared outcome metric -- typically incremental revenue or incremental profit as measured by the reconciled system. This is uncomfortable. It is also necessary.

Component 3: A single source of record. The reconciled dashboard is the official measurement output. No team is permitted to present uncalibrated model outputs in budget discussions. This requires executive enforcement and cultural change. It also requires the reconciled system to be transparent about its methodology and uncertainty -- if people cannot understand why their numbers changed, they will not trust the system.

The transition takes time. A realistic timeline: six months to build the data infrastructure and run the first round of calibrating experiments. Twelve months to establish the reconciliation process and the Measurement Council cadence. Eighteen months to shift organizational culture so that calibrated, reconciled estimates are the default language of budget discussions. Two years to reach steady state.

That timeline sounds long. It is short compared to the alternative, which is continuing to misallocate 20-30% of your marketing budget indefinitely because three measurement systems cannot agree and nobody has built the connective tissue to make them.

Putting It Together

Unified measurement is not a product you buy. It is an architecture you build -- part statistical machinery, part data infrastructure, part organizational governance.

The framework rests on three principles. First, establish a measurement hierarchy with experiments at the top. When experiments contradict models, recalibrate the models. Second, use Bayesian calibration to propagate experimental evidence through your MMM, extending causal estimates to channels and time periods where experiments have not been run. Third, reconcile aggregate and user-level views by using calibrated MMM totals as constraints on MTA allocations, preserving the granularity of attribution within the accuracy of econometrics.

The hardest part is not the math. The math is well-established and increasingly accessible through open-source tools. The hardest part is building an organization that prefers uncomfortable truths to convenient narratives. That means shared incentives across measurement teams, executive commitment to the measurement hierarchy, and the discipline to run experiments even when they reveal that your favorite channel is less effective than you thought.

Three kingdoms producing three answers is the default. A single coherent picture is the goal. The path between them is paved with experiments, calibration cycles, and the organizational courage to let the evidence lead.

References

Chan, D., & Perry, M. (2017). Challenges and opportunities in media mix modeling. Google Research Working Paper.
Gordon, B. R., Zettelmeyer, F., Bhatt, N., & Arora, S. (2019). Close enough? A large-scale exploration of non-experimental approaches to advertising measurement. Marketing Science, 38(6), 895-914.
Vaver, J., & Koehler, J. (2011). Measuring ad effectiveness using geo experiments. Google Technical Report.
Jin, Y., Wang, Y., Sun, Y., Chan, D., & Koehler, J. (2017). Bayesian methods for media mix modeling with carryover and shape effects. Google Technical Report.
Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N., & Scott, S. L. (2015). Inferring causal impact using Bayesian structural time series models. Annals of Applied Statistics, 9(1), 247-274.
Marketing Science Institute. (2019). Marketing budget allocation: Current practices and opportunities for improvement. MSI Working Paper Series.
Shapley, L. S. (1953). A value for n-person games. In H. W. Kuhn & A. W. Tucker (Eds.), Contributions to the Theory of Games (Vol. 2, pp. 307-317). Princeton University Press.
Sun, Y., Wang, Y., Jin, Y., Chan, D., & Koehler, J. (2017). Geo-level Bayesian hierarchical media mix modeling. Google Technical Report.
Zettelmeyer, F. (2020). Measuring marketing effectiveness: Methods and pitfalls. Kellogg School of Management Working Paper.
Angrist, J. D., & Pischke, J. S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian Data Analysis (3rd ed.). Chapman and Hall/CRC.
Google. (2023). Meridian: An open-source media mix model. Google AI Blog.