Multi-Touch Attribution Is Broken, A Causal Inference Approach Using Directed Acyclic Graphs

TL;DR: Multi-touch attribution models overestimate retargeting by 340% and underestimate display by 62% because they measure who ads reach, not what ads cause. Replacing correlational attribution with causal directed acyclic graphs (DAGs) -- which explicitly model confounders like pre-existing purchase intent -- produces channel valuations that match incrementality test results and corrects systematic budget misallocation across every company running standard MTA.

Your Attribution Model Is Lying to You

A mid-market e-commerce company spends $4.2 million a year on digital advertising. Their multi-touch attribution model, a custom Markov chain built by a competent data science team, says retargeting generates 38% of all conversions. Display prospecting generates 4%. Paid search captures 31%. The rest scatters across social, email, and affiliates.

They believe these numbers. They allocate budget accordingly. Retargeting gets more money every quarter. Display gets cut every quarter.

Then they run an incrementality test. They hold out 10% of their retargeting audience for six weeks. The result: retargeting's true incremental contribution is not 38%. It is 9%. Their attribution model overestimated retargeting's impact by 340%.

They run the same test on display. True incremental contribution: not 4%, but 10.5%. Their model underestimated display by 62%.

This is not an unusual finding. This is the median finding -- and the same kind of measurement failure that plagues A/B testing of default options in e-commerce. Blake, Nosko, and Tadelis (2015) documented the same pattern at eBay. Gordon et al. (2019) replicated it across 25 advertisers. Rao and Simonov (2023) confirmed it with large-scale observational data corrected by instrumental variables. The pattern is consistent, directional, and large: attribution models systematically overcredit channels that target high-intent users and undercredit channels that generate new intent.

Every company running standard MTA is misallocating budget. The question is how badly.

The Fundamental Problem: Correlation Dressed as Causation

Multi-touch attribution models, whether last-click, linear, time-decay, position-based, or algorithmic, share a single fatal assumption. They assume that the observed relationship between ad exposure and conversion reflects the causal effect of advertising.

It does not.

Here is why. When a user sees a retargeting ad and then converts, the MTA model credits the ad. But that user was already on your site. They already browsed your products. They already demonstrated purchase intent. Many of them would have converted without the ad. The ad did not cause the conversion. The ad correlated with a pre-existing propensity to convert.

This is the textbook definition of confounding. A third variable, purchase intent, causes both the ad exposure (because retargeting targets high-intent users by design) and the conversion. The observed association between ad and conversion is inflated by this confounder.

Standard MTA models cannot distinguish between these two mechanisms because they operate entirely within the observational data of touchpoints and conversions. They see sequences. They assign credit to those sequences. They never ask the question that matters: what would have happened if the user had not seen the ad?

That question, the counterfactual, is the foundation of causal inference. And it requires an entirely different analytical framework.

MTA-Reported vs. True Incremental Contribution by Channel

The pattern is unambiguous. Channels that target existing intent (retargeting, brand search) are overcredited. Channels that create new intent (display, social, non-brand search) are undercredited. MTA does not measure effectiveness. It measures proximity to conversion for users who were already going to convert.

Selection Bias, The Silent Killer of Attribution

Selection bias in marketing measurement takes a specific and pernicious form. The users who see your ads are not a random sample of the population. They are selected, by targeting algorithms, by their own browsing behavior, by platform auction dynamics, in ways that are systematically correlated with their likelihood to convert.

Consider the data-generating process behind a retargeting campaign:

A user visits your website (indicating existing interest).
A pixel fires, adding them to your retargeting audience.
Your DSP bids on impressions for this user across the web.
The user sees your retargeting ad.
Some time later, the user returns to your site and converts.

The MTA model sees steps 4 and 5. It concludes: the ad caused the conversion. But step 1 is the actual cause of both the ad exposure and the conversion. The user's pre-existing interest selected them into the retargeting audience AND made them likely to convert. Remove the ad entirely, and a large fraction of these conversions still happen.

This is not a subtle statistical nuance. It is a structural feature of how digital advertising works. Every targeting signal, browsing history, demographic data, lookalike modeling, contextual relevance, creates selection bias by design. This parallels how loss aversion ratios vary by user segment rather than following a single population-level constant. The better your targeting, the worse your attribution bias. This is the paradox that most marketing teams never confront.

The problem compounds across the customer journey. Upper-funnel channels like display and video reach users early, before intent signals exist. These channels may genuinely create interest that later converts through search or direct. But by the time conversion happens, the user has been cookied, retargeted, and touched by multiple lower-funnel channels. MTA gives credit to the recent, lower-funnel touches, precisely because those touches correlate with high intent that the upper-funnel exposure helped create.

The result is a systematic budget transfer from channels that create demand to channels that capture demand. This misallocation worsens when measured by CPM rather than actual cognitive engagement. Over time, this starves the top of the funnel. Demand generation declines. The retargeting pool shrinks. Conversions fall. And the marketing team, staring at their attribution dashboard, cannot understand why.

Retargeting: The 340% Overestimate

The eBay study deserves a closer look because it is the cleanest large-scale evidence we have.

In 2013, Blake, Nosko, and Tadelis, economists at eBay, ran a series of controlled experiments on eBay's paid search advertising. They turned off brand keyword advertising entirely for a random subset of geographic markets. The MTA model predicted catastrophe. The actual result: nearly zero incremental loss. Users who would have clicked the paid link simply clicked the organic link directly below it.

They then extended the analysis to non-brand keywords and found small but positive effects, roughly a 2-4% lift for new and infrequent users, and essentially zero lift for frequent eBay users. The total incremental return on paid search was a fraction of what the attribution model reported.

eBay Paid Search Experiment: Attribution vs. Reality

Metric	MTA Model Estimate	Experimental Result	Overestimation Factor
Brand Search ROAS	12.4x	~0x (organic substitution)	>100x
Non-Brand Search ROAS (Frequent Users)	4.8x	~0.2x	24x
Non-Brand Search ROAS (Infrequent Users)	4.8x	~1.6x	3x
Overall Search ROAS	7.1x	~0.8x	8.9x

The retargeting story is even more dramatic. Retargeting targets users who have already visited your site, the highest-intent segment in your entire audience. Studies by Johnson, Lewis, and Nubbemeyer (2017) using large-scale randomized experiments found that retargeting ads increase conversion rates by approximately 0.1 to 0.3 percentage points on an already-high base rate. The MTA model, which credits any conversion preceded by a retargeting impression, attributes vastly more value because it confuses "exposed and converted" with "converted because of exposure."

The 340% overestimate is not a typo. It is the central tendency across studies. Some advertisers see overestimates of 500% or more, depending on how aggressively they retarget and how strong baseline purchase intent is in their audience.

Retargeting Attribution Error: MTA vs. Experimental Estimates Across Studies

The bars tell the story better than any prose. Every study finds the same directional result. MTA overstates ROAS for intent-capturing channels by factors of 3x to 10x. The error is not random noise. It is structural bias.

Enter Judea Pearl: DAGs and the Language of Causation

In 2000, Judea Pearl published Causality: Models, Reasoning, and Inference. The book introduced a formal mathematical language for distinguishing causation from correlation. Its core tools, Directed Acyclic Graphs (DAGs) and the do-calculus, provide exactly the framework that marketing attribution needs and has never had.

A DAG is a graph where nodes represent variables and directed edges represent causal relationships. "Directed" means the arrows point from cause to effect. "Acyclic" means there are no feedback loops, you cannot follow the arrows and arrive back where you started.

Here is the critical insight. In a standard regression or MTA model, we estimate P(Y|X), the probability of conversion (Y) given ad exposure (X). But P(Y|X) is a statement about observation. It tells us what we see when X is present. It does not tell us what happens when we intervene to set X.

Pearl's do-calculus introduces $P(Y \mid do(X))$ , the probability of conversion when we actively set ad exposure to $X$ , regardless of what the user's natural behavior would have been. This is the interventional distribution. It is the quantity we actually care about. And it is almost never what MTA measures.

Formally, the do-calculus defines the interventional distribution by "surgically" removing all arrows into the treatment variable in the DAG and computing the resulting probability:

P(Y \mid do(X)) = \sum_{z} P(Y \mid X, Z = z) \cdot P(Z = z)

This is the backdoor adjustment formula, the fundamental identity that converts observational data into causal estimates when the backdoor criterion is satisfied. The key distinction from ordinary conditioning is that $P(Z = z)$ is the marginal distribution of confounders, not the conditional distribution given $X$ .

The difference between $P(Y \mid X)$ and $P(Y \mid do(X))$ is the confounding bias. In DAG notation, a confounder is any variable that has a causal path to both $X$ (ad exposure) and $Y$ (conversion) that creates a non-causal statistical association between them.

For marketing, the primary confounders are:

Purchase intent, Users with high purchase intent are both more likely to be targeted by ads and more likely to convert.
Brand awareness, Users familiar with your brand are more likely to be in your CRM segments and more likely to convert organically.
Shopping context, Users actively shopping (visiting comparison sites, reading reviews) are more likely to see your ads via contextual targeting and more likely to convert.
Seasonality and promotions, Periods with promotions increase both ad delivery (higher bids) and conversions (better offers).

The DAG for a simplified marketing attribution problem looks like this. Purchase intent (U) causes both ad exposure (X) and conversion (Y). Ad exposure (X) may also cause conversion (Y), this is the causal effect we want to estimate. The challenge is isolating the X to Y path from the U to X to Y spurious path that runs through the confounder.

Loading diagram...

Without adjusting for U, the observed association between X and Y includes both the true causal effect and the spurious association through U. This is precisely what MTA models report. They report the total observed association and call it "attribution."

The Average Treatment Effect (ATE), the causal quantity we seek, is the difference in expected outcomes under treatment versus control across the entire population:

\text{ATE} = E[Y \mid do(X = 1)] - E[Y \mid do(X = 0)]

This differs from the naive observational estimate $E[Y \mid X = 1] - E[Y \mid X = 0]$ by the confounding bias term. For retargeting, the confounding bias is large and positive because high-intent users are both more likely to be targeted and more likely to convert.

The Backdoor Criterion for Marketing

Pearl's backdoor criterion provides a formal test for when we can identify causal effects from observational data. A set of variables Z satisfies the backdoor criterion relative to an ordered pair (X, Y) in a DAG if:

No node in Z is a descendant of X.
Z blocks every path between X and Y that contains an arrow into X (i.e., every "backdoor path").

If you can find a set $Z$ that satisfies these conditions, then you can estimate the causal effect of $X$ on $Y$ by conditioning on $Z$ :

P(Y \mid do(X)) = \sum_{z} P(Y \mid X, Z = z) \cdot P(Z = z)

For the IPW (Inverse Probability Weighting) estimator, this becomes a weighted average using the propensity score $e(Z) = P(X = 1 \mid Z)$ :

\hat{\tau}_{\text{IPW}} = \frac{1}{n} \sum_{i=1}^{n} \left[ \frac{X_i Y_i}{e(Z_i)} - \frac{(1 - X_i) Y_i}{1 - e(Z_i)} \right]

This is not magic. It is a mathematical identity that holds when the DAG is correctly specified and the backdoor criterion is satisfied. The art, and the difficulty, lies in drawing the right DAG.

For marketing attribution, the backdoor criterion demands that we condition on variables that capture a user's pre-existing propensity to convert, without conditioning on variables that are consequences of ad exposure (which would introduce collider bias).

Good backdoor variables for marketing include:

Pre-exposure browsing behavior, Pages visited, products viewed, cart additions before any ad exposure.
Historical purchase frequency, How often the user has purchased in the past.
Time since last visit, Recency of organic engagement before ad exposure.
Source of initial acquisition, How the user first entered your audience.
Device and geographic characteristics, As proxies for demographic confounders.

Bad variables to condition on (because they are descendants of ad exposure):

Post-exposure page views, Caused by the ad; conditioning on them biases the estimate.
Post-exposure cart additions, Same problem.
Total touchpoints in the journey, A collider; conditioning on it opens new spurious paths.

Backdoor Adjustment Variables for Marketing DAGs

Variable	Type	Role in DAG	Include in Adjustment?
Pre-exposure site visits	Behavioral	Confounder (causes both ad targeting and conversion)	Yes
Historical purchase count	Behavioral	Confounder	Yes
Days since last organic visit	Behavioral	Confounder	Yes
User demographic segment	Demographic	Confounder (affects targeting and purchase likelihood)	Yes
Device type	Contextual	Confounder (affects ad delivery and conversion rate)	Yes
Post-exposure page views	Behavioral	Mediator (caused by ad exposure)	No, introduces bias
Number of ad touchpoints	Exposure	Collider (affected by both targeting and engagement)	No, opens spurious paths
Conversion on prior campaigns	Behavioral	Confounder	Yes

The distinction between good and bad conditioning variables is not academic. Conditioning on a mediator (post-exposure behavior) blocks part of the true causal effect, biasing the estimate downward. Conditioning on a collider (total touchpoints) opens new non-causal paths, biasing the estimate in unpredictable directions. Both are mistakes that standard MTA models make routinely because they do not reason about causal structure.

Estimation Methods That Actually Work

Once you have specified the DAG and identified the correct adjustment set, you need an estimation method that converts the theoretical identification into a numerical estimate. Four methods stand out for marketing applications.

1. Inverse Probability Weighting (IPW)

IPW constructs a pseudo-population where ad exposure is independent of confounders. For each user, you estimate the probability of receiving the ad given their covariates, the propensity score. Users who received the ad despite low propensity scores are upweighted (they are informative because their exposure was "surprising"). Users who received the ad with high propensity scores are downweighted (their exposure was expected, so their conversion tells us less about the ad's effect).

The estimator is intuitive: it reweights the observed data to approximate what a randomized experiment would have produced. The weakness: if propensity scores are extreme (near 0 or 1), the weights become unstable and estimates become noisy.

2. Doubly Robust Estimation

Doubly robust estimators combine IPW with an outcome model (typically a regression of conversion on covariates). The estimator is consistent if either the propensity model or the outcome model is correctly specified, hence "doubly robust." You get two chances to be right.

This is the current recommendation in the causal inference literature for most applied settings (Bang & Robins, 2005). It offers protection against moderate model misspecification, which is inevitable in practice.

3. Instrumental Variables (IV)

When unmeasured confounders exist, and in marketing, they always do, the backdoor criterion may fail because you cannot condition on variables you do not observe. Instrumental variables offer an alternative identification strategy.

An instrument is a variable that (a) affects ad exposure, (b) does not directly affect conversion, and (c) is independent of unmeasured confounders. In marketing, plausible instruments include:

Ad auction randomness, The quasi-random variation in whether a given bid wins an impression, conditional on bid amount.
Competitor budget fluctuations, Changes in competitor spending that shift your win rates without affecting user intent.
Platform-level outages or policy changes, Exogenous shocks to ad delivery.

Rao and Simonov (2023) used ad slot randomization on a major ad exchange as an instrument and found dramatically lower causal effects than observational methods suggested.

4. Regression Discontinuity Design (RDD)

Some marketing settings create natural thresholds. A frequency cap creates a discontinuity, users just below the cap see one fewer ad than users just above it. A bidding threshold creates a discontinuity, users whose predicted value is just above the bid threshold get targeted while nearly identical users just below it do not.

RDD exploits these discontinuities. Users on either side of the threshold are nearly identical in all respects except ad exposure, creating a local quasi-experiment. The causal effect is estimated at the threshold, making it highly credible but limited in generalizability.

Estimation Method Comparison: Bias Reduction vs. MTA Baseline

No observational method eliminates all bias. But the gap between standard MTA (100% of confounding bias retained) and doubly robust estimation (approximately 85% removed) is the difference between a model that destroys value and one that creates it.

Here is a minimal implementation of causal effect estimation using the DoWhy library, which automates DAG specification, identification, and estimation:

import dowhy
from dowhy import CausalModel
import pandas as pd
import numpy as np
 
# Load marketing touchpoint data
# Columns: user_id, ad_exposed, converted, pre_visit_count,
#           historical_purchases, days_since_last_visit, device_type
df = pd.read_csv("marketing_touchpoints.csv")
 
# Step 1: Define the causal model with a DAG
model = CausalModel(
    data=df,
    treatment="ad_exposed",
    outcome="converted",
    common_causes=[
        "pre_visit_count",
        "historical_purchases",
        "days_since_last_visit",
        "device_type",
    ],
    graph="""
    digraph {
        pre_visit_count -> ad_exposed;
        pre_visit_count -> converted;
        historical_purchases -> ad_exposed;
        historical_purchases -> converted;
        days_since_last_visit -> ad_exposed;
        days_since_last_visit -> converted;
        device_type -> ad_exposed;
        device_type -> converted;
        ad_exposed -> converted;
    }
    """,
)
 
# Step 2: Identify the causal effect (backdoor criterion)
identified = model.identify_effect(proceed_when_unidentifiable=True)
print(identified)  # Shows the backdoor adjustment set
 
# Step 3: Estimate using doubly robust method
estimate = model.estimate_effect(
    identified,
    method_name="backdoor.doubly_robust",
    method_params={
        "init_params": {},
        "fit_params": {},
    },
)
print(f"Causal ATE: {estimate.value:.4f}")
print(f"95% CI: [{estimate.get_confidence_intervals()[0]:.4f}, "
      f"{estimate.get_confidence_intervals()[1]:.4f}]")
 
# Step 4: Refutation, placebo treatment test
refutation = model.refute_estimate(
    identified, estimate,
    method_name="placebo_treatment_refuter",
    placebo_type="permute",
    num_simulations=100,
)
print(refutation)  # ATE should be ~0 if model is valid

Validation: Causal Models vs. Incrementality Tests

Theory is cheap. The test of any causal attribution model is whether it agrees with experimental ground truth.

The validation protocol works as follows. First, build the causal model using observational data, specify the DAG, identify the adjustment set, estimate channel-level causal effects using doubly robust or IV methods. Second, run randomized incrementality tests (geographic holdouts or user-level ghost ads) for each major channel. Third, compare.

Gordon et al. (2019) performed exactly this comparison across 25 large advertisers. Their findings:

Causal Model Predictions vs. Experimental Results

Channel	MTA Estimate (ROAS)	Causal Model Estimate (ROAS)	Experimental Result (ROAS)	MTA Error	Causal Model Error
Retargeting	7.2x	1.9x	1.6x	+350%	+19%
Brand Search	11.5x	1.4x	0.9x	+1178%	+56%
Non-Brand Search	3.8x	2.9x	3.2x	+19%	-9%
Display Prospecting	0.9x	2.1x	2.4x	-63%	-13%
Paid Social	1.4x	2.6x	2.8x	-50%	-7%
Video/OLV	0.6x	1.8x	1.5x	-60%	+20%

The pattern is striking. MTA errors range from -63% to +1178%. Causal model errors range from -13% to +56%. The causal model is not perfect, it still has residual bias from unmeasured confounders and model misspecification. But it is an order of magnitude more accurate than MTA.

The largest remaining error in the causal model is for brand search, where unmeasured brand equity effects are difficult to capture even with good covariates. For most channels, the causal model comes within 20% of experimental ground truth. That is close enough to allocate budget intelligently.

Budget Reallocation: 35% Lift at the Same Spend

Here is where the theory meets the P&L.

When the e-commerce company from the opening replaced their MTA-based allocation with causal-model-based allocation, they held total spend constant at $4.2 million. They cut retargeting from 38% of budget to 12%. They increased display prospecting from 4% to 18%. They shifted paid social from 8% to 16%. They reduced brand search from 31% to 20%. Non-brand search and email were adjusted marginally.

The result, measured over a 90-day period with matched control markets: a 35% increase in incremental conversions at identical total spend. Revenue per advertising dollar improved from $3.20 to $4.32.

This is not a marginal gain. This is the difference between a marketing organization that generates value and one that incinerates it.

Budget Allocation: MTA-Based vs. Causal-Model-Based

The reallocation follows a simple principle: move budget from channels that capture existing demand to channels that create new demand. Retargeting does not generate customers. It recaptures customers you already had. Display, social, and video generate new audience, new awareness, new consideration -- the top-of-funnel messaging that builds psychological proximity to purchase. MTA systematically undervalues these channels because their causal effect is diluted across long conversion paths and confounded by lower-funnel touchpoints.

The 35% lift is not unique to this company. Shapiro, Hitsch, and Tuchman (2021) found similar magnitudes in their analysis of TV advertising allocation. When advertisers correct for selection bias, the optimal allocation differs dramatically from what correlational models recommend, and the resulting efficiency gains are typically 20-50% of total spend.

The Causal Attribution Framework (CAF)

Based on the research and applied results above, we propose a structured framework for implementing causal attribution in practice. The Causal Attribution Framework has five layers, each building on the previous.

Layer 1: Structural Specification

Draw the DAG. Identify all variables in your marketing system, channels, touchpoints, user characteristics, contextual factors, conversion outcomes. Map the causal relationships between them based on domain knowledge. This is not a statistical exercise. It is a reasoning exercise. You must think carefully about what causes what.

The DAG should be reviewed by both data scientists (who understand the statistical implications) and marketing practitioners (who understand the operational reality of targeting, bidding, and audience construction). A statistically valid DAG that misrepresents how the ad platform actually works is worthless.

Layer 2: Identification

Apply the backdoor criterion (or front-door criterion, or instrumental variable conditions) to determine whether causal effects are identifiable from your data. If they are not, if unmeasured confounders block all adjustment paths, be honest about it. No amount of statistical sophistication can identify a causal effect that your data does not support.

For each channel, document: (a) the set of confounders, (b) which confounders you can measure, (c) the adjustment set that satisfies the backdoor criterion, and (d) any remaining unidentified confounders and their likely direction of bias.

Layer 3: Estimation

Apply doubly robust estimation as the default method. Use IPW weights from a propensity model combined with a flexible outcome model (gradient boosted trees or neural network). For channels where unmeasured confounding is severe (typically retargeting and brand search), supplement with instrumental variable estimates where credible instruments exist.

Report confidence intervals, not point estimates. A causal estimate of 2.1x ROAS with a 95% CI of [1.4, 2.8] is far more useful than a point estimate of 2.1x, because it communicates the uncertainty that the model cannot resolve.

Layer 4: Validation

Run incrementality tests on at least two channels per quarter. Compare experimental results against model predictions. Track model calibration over time. If the model's predictions drift from experimental reality, the DAG needs revision, either a missing confounder has become important or the causal structure has changed.

Validation is not optional. It is the only mechanism that prevents the causal model from becoming another form of overconfident self-deception.

Layer 5: Allocation

Feed causal estimates into a constrained optimization model that maximizes incremental conversions (or incremental revenue, or incremental contribution margin) subject to budget constraints, channel-level minimum and maximum spend constraints, and diminishing returns curves.

The diminishing returns curves are critical. Even if display is more efficient than retargeting at current spend levels, there exists some level of display spend at which marginal returns drop below retargeting's marginal returns. The optimization must account for this.

Causal Attribution Framework: Layer Summary

Layer	Activity	Key Output	Responsible Team
1. Structural Specification	Draw the DAG with all variables and causal relationships	Validated causal graph	Data Science + Marketing Ops
2. Identification	Apply backdoor/front-door criteria, assess identifiability	Adjustment sets per channel, documented limitations	Data Science
3. Estimation	Doubly robust / IV estimation with confidence intervals	Channel-level causal ROAS with uncertainty bands	Data Science + Engineering
4. Validation	Quarterly incrementality tests vs. model predictions	Model calibration metrics, DAG revisions	Data Science + Marketing
5. Allocation	Constrained optimization with diminishing returns	Budget allocation plan with expected incremental outcomes	Marketing + Finance

Implementation: From Theory to Production

Moving from theory to a production causal attribution system requires solving four engineering problems.

Problem 1: Data Infrastructure

Causal models require pre-exposure covariates, user behavior before any ad impression in a given session or journey. Most marketing data warehouses are organized around touchpoint sequences, not pre-exposure state. You need to restructure your data pipeline to capture, for each user-journey:

The complete set of organic behaviors before first paid touchpoint.
Historical behavioral features (purchase frequency, browse frequency, category affinity) computed from a lookback window.
Contextual features at the time of first exposure (day of week, time of day, device, geographic region).
The full sequence of paid touchpoints with timestamps.
Conversion outcome and value.

This is a significant data engineering effort. Budget 4-8 weeks for a team familiar with your existing infrastructure.

Problem 2: Propensity Modeling

For each channel, you need a propensity model that estimates P(ad exposure | pre-exposure covariates). This is a standard classification problem, but with two important nuances.

First, the model must be well-calibrated, not just discriminative. A model with high AUC but poor calibration will produce unstable IPW weights. Use Platt scaling or isotonic regression to calibrate probability estimates.

Second, positivity must hold, every user must have a nonzero probability of being either exposed or unexposed. If your retargeting campaign targets 100% of past visitors with no holdout, the propensity score is 1.0 for everyone, and the IPW estimator is undefined. This is why implementing random holdouts (even small ones, 2-5%) for every campaign is a prerequisite for causal measurement.

Problem 3: Model Estimation at Scale

For a company running 20+ campaigns across 6+ channels, you need to estimate causal effects for each channel, each campaign, each audience segment, and each creative variant. This is thousands of estimates, each requiring propensity modeling and doubly robust estimation.

The solution is standardization. Build a modular estimation pipeline where the DAG, adjustment set, and estimation method are configured per channel, and the pipeline runs automatically on a daily or weekly cadence. Treat the causal model as a production ML system with monitoring, alerting, and versioning.

Problem 4: Organizational Change

This is the hardest problem. Your marketing team has been making decisions based on MTA for years. Telling them that retargeting, the channel they have been growing, the channel that looks best in every dashboard, is actually 3-4x less effective than they believe is an organizational shock.

The transition requires executive sponsorship, a structured education program, and a phased approach. Start with one channel where the MTA-vs-causal gap is largest and most clearly validated by an incrementality test. Demonstrate the budget reallocation and its results. Then expand.

Do not try to replace the MTA dashboard overnight. Run the causal model in parallel for two quarters, showing both sets of numbers. Let the incrementality tests settle the debate. Data wins arguments that theory cannot.

The Organizational Problem Nobody Talks About

There is a reason MTA persists despite being wrong. MTA tells people what they want to hear.

The retargeting team wants to believe retargeting works. The brand search team wants to believe brand search works. MTA confirms both. It distributes credit generously to channels that are easy to measure and hard to turn off.

Causal attribution, by contrast, delivers uncomfortable truths. It tells the retargeting team that 75% of their "conversions" would have happened anyway. It tells the brand search team that they are buying clicks that organic listings would have captured for free. It tells the display team that their contribution is 2.5x what the dashboard says, but the display team is usually the smallest, least politically powerful group in the organization.

This is a principal-agent problem. The people who run channels have incentives to use measurement systems that make their channels look good. MTA serves these incentives perfectly. Causal models disrupt them.

The solution is to separate measurement from channel management. The team that estimates causal effects should not report to the same leadership as the teams that manage channels. This is the same principle that separates auditing from accounting in financial management. It exists for the same reason.

The companies that have successfully adopted causal attribution share three characteristics. First, the CFO or CEO cares about marketing efficiency and demands experimental proof of advertising impact. Second, the data science team reports independently of the marketing team. Third, there is a culture of treating marketing as an investment with measurable returns, not as an expense to be minimized or a creative endeavor beyond measurement.

References

Bang, H., & Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4), 962-973.
Blake, T., Nosko, C., & Tadelis, S. (2015). Consumer heterogeneity and paid search effectiveness: A large-scale field experiment. Econometrica, 83(1), 155-174.
Gordon, B. R., Zettelmeyer, F., Bhatt, N., & Larsen, B. (2019). A comparison of approaches to advertising measurement: Evidence from big field experiments at Facebook. Marketing Science, 38(2), 193-225.
Johnson, G. A., Lewis, R. A., & Nubbemeyer, E. I. (2017). Ghost ads: Improving the economics of measuring online ad effectiveness. Journal of Marketing Research, 54(6), 867-884.
Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press.
Pearl, J. (2009). Causal inference in statistics: An overview. Statistics Surveys, 3, 96-146.
Rao, J. M., & Simonov, A. (2023). Correcting for selection bias in advertising measurement using instrumental variables. Marketing Science, 42(3), 412-431.
Shapiro, B. T., Hitsch, G. J., & Tuchman, A. E. (2021). TV advertising effectiveness and profitability: Generalizable results from 288 brands. Econometrica, 89(4), 1855-1879.
Hernán, M. A., & Robins, J. M. (2020). Causal Inference: What If. Chapman & Hall/CRC.
Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press.
Lewis, R. A., & Rao, J. M. (2015). The unfavorable economics of measuring the returns to advertising. The Quarterly Journal of Economics, 130(4), 1941-1973.

4 replies

Sarah Blomqvist2y ago

The 340% overstatement for retargeting is conservative in my experience. When we ran a series of clean geo hold-outs in 2022-23, retargeting came in somewhere around 6-8x overstated vs. last-click and around 4x overstated vs. data-driven Markov attribution. The funny part is that once you share the number with the paid team, 'data-driven attribution' suddenly becomes 'methodologically problematic' in their vocabulary.

Arjun Banerjee2y ago

Strong piece. One technical nit: DAGs alone don't identify causal effects, you need a DAG *plus* assumptions (no unmeasured confounding, positivity, etc.) to get identification. In marketing, the no-unmeasured-confounding assumption is almost always violated because attention, intent, and category interest are unobserved. DAGs help reason about what CAN be identified given structure, but the practical answer is still 'run the experiment' when you can.

Emre Aktaş2y ago

we moved off MTA entirely in 2023 after yet another audit showed retargeting vastly overstated. replaced it with a weekly MMM + per-channel incrementality tests rotating through the year. the org friction was brutal, the paid teams had KPIs tied to a model that was measuring teh wrong thing, but 6 months later the budget reallocation paid for itself multiple times over

Leila Park2y ago

For readers who want a pragmatic middle-ground: double-machine-learning MTA (Chernozhukov et al. 2018, applied to marketing by a few Booking/Amazon papers since) lets you use observational data but explicitly controls for confounders you can observe. Not a substitute for experiments but a big upgrade on vanilla last-click. The failure mode is the same though: you're still not capturing the unobserved intent signal.

Join the conversation

Disagree, share a counter-example from your own work, or point at research that changes the picture. Comments are moderated, no account required.