The data

Original data, citable and open.

17 datasets produced or analyzed for the essays. Each is available as structured JSON, fully described with methodology, sample size, and license.

  • Loss Aversion Ratios by Stake Level

    JSON →

    Empirical loss aversion coefficient λ observed in marketplace pricing experiments, decomposed by transaction stake level. Shows that λ is not constant (textbook 2.25) but varies systematically with stake magnitude and user platform investment.

    Sample size
    ~14.1M total observations across buckets
    Collected
    2024-06/2025-05
    License
    CC-BY-4.0 for cited figures

    Used in · loss-aversion-asymmetry-digital-marketplaces

  • Churn Windows by Discount-Type Subscriber Cohort

    JSON →

    Observed cancellation concentration around billing dates across subscription cohorts, decomposed by estimated beta (present-bias) parameter. Shows that 52–68% of annual churn events in consumer subscriptions occur within 7 days of a billing date — consistent with the beta-delta hyperbolic discounting model's prediction of billing-day regret.

    Sample size
    ~2.4M subscriber-months
    Collected
    2022-01/2024-12
    License
    CC-BY-4.0 for aggregate figures

    Used in · hyperbolic-discounting-subscription-churn

  • Ladder-Up vs Ladder-Down SaaS Pricing Conversion

    JSON →

    Direct A/B test of ladder-up (start on free or starter, prompt to upgrade) vs ladder-down (start on premium trial, prompt to downgrade) pricing paths across 9 SaaS products. Ladder-down converts 31–58% more paying users, driven by endowment-effect-induced resistance to losing premium features.

    Sample size
    ~182K signups across 9 products
    Collected
    2024-Q2/Q3
    License
    CC-BY-4.0 for aggregate figures

    Used in · endowment-effect-saas-pricing

  • Platform Entry Threshold by Complementor Category Share

    JSON →

    Observed relationship between a complementor category's share of platform transaction volume and the probability that the platform enters the category within 24 months. Entry becomes likely once a category exceeds ~8% of platform volume.

    Sample size
    412 categories across 4 platforms
    Collected
    2018-2024
    License
    CC-BY-4.0 for cited figures

    Used in · platform-cannibalization-dynamics

  • Vertical SaaS Market Concentration and Multi-Homing

    JSON →

    Market concentration (top-3 share) and supplier-side multi-homing rates across 40 vertical SaaS markets. Winner-take-most is the exception, not the rule: only 22% of vertical markets exhibit top-3 share above 70%, and those markets also show low multi-homing.

    Sample size
    40 vertical markets
    Collected
    2024
    License
    CC-BY-4.0 for synthesis

    Used in · winner-take-most-multi-homing-vertical-saas · two-sided-network-effects-dead

  • MTA Reported ROAS vs Experimental (Incrementality) ROAS

    JSON →

    Side-by-side comparison of ROAS reported by multi-touch attribution systems versus ROAS estimated via randomized geo-lift experiments for the same channels and periods. MTA systematically overstates ROAS by 2.4–6.5x, with the gap widest for retargeting and display.

    Sample size
    6 published studies, 22 channel-study combinations
    Collected
    2015-2024
    License
    CC-BY-4.0 for synthesis

    Used in · multi-touch-attribution-causal-inference-dag · unified-measurement-architecture-mmm-mta-experimentation

  • Bayesian MMM — Channel Saturation and Adstock Parameters

    JSON →

    Posterior estimates of adstock half-life and saturation parameters (Hill function) for eight paid-media channels from a privacy-first Bayesian marketing mix model. Reveals that TV has the longest decay (12-week half-life) while search has the shortest (under 1 week).

    Sample size
    156 weeks × 180 DMAs × 8 channels
    Collected
    2022-01/2024-12
    License
    CC-BY-4.0 for cited figures

    Used in · marketing-mix-modeling-privacy-first-era · unified-measurement-architecture-mmm-mta-experimentation

  • CausalImpact Lift from a B2B Content Program

    JSON →

    Bayesian structural time series (CausalImpact) estimate of the causal lift on organic traffic from launching a dedicated 36-article B2B content program. Non-branded organic captures only 38% of total SEO impact; the remaining 62% flows through branded search and direct traffic.

    Sample size
    104 weeks, 8 control variables, 6 outcomes
    Collected
    2023-Q1/2025-Q1
    License
    CC-BY-4.0 for cited figures

    Used in · causal-impact-seo-branded-search · compounding-advantage-content-moats-seo

  • Cohort LTV/CAC and Payback by Acquisition Channel

    JSON →

    Acquisition-cohort unit economics for a consumer SaaS business, decomposed by channel. Exposes the aggregation fallacy: the rolled-up 3.1x LTV/CAC hides a channel portfolio with individual ratios ranging from 0.8x (brand-misaligned display) to 8.4x (organic referral), with very different payback profiles.

    Sample size
    ~18,400 customers across 7 channels in Q1 2023 cohort
    Collected
    2023-01/2025-01
    License
    CC-BY-4.0 for cited figures

    Used in · cohort-based-unit-economics · clv-control-variable-bid-strategies

  • Test Duration Reduction from Bayesian vs Frequentist A/B Testing

    JSON →

    Head-to-head comparison of decision latency between Bayesian posterior-probability testing and classical frequentist fixed-sample testing across 48 production experiments. Median time-to-decision dropped 36% under Bayesian methodology with no increase in downstream product regret.

    Sample size
    48 experiments, ~29M visitor-sessions total
    Collected
    2024-Q1/2025-Q1
    License
    CC-BY-4.0 for aggregate figures

    Used in · bayesian-ab-testing-practice

  • Cox Proportional Hazards — SaaS Churn Covariates

    JSON →

    Fitted hazard ratios for ten covariates on 18-month SaaS subscriber survival. Feature usage depth and onboarding completion dominate (hazard ratios 0.34 and 0.41 respectively); price tier and annual billing have smaller but significant effects. Shows that churn is primarily a product-engagement phenomenon, not a pricing phenomenon.

    Sample size
    82,450 subscribers, 14,212 churn events
    Collected
    2023-07/2025-01
    License
    CC-BY-4.0 for cited figures

    Used in · survival-analysis-subscription-businesses

E-commerce ML3 datasets
  • Learning-to-Rank Revenue Lift by Objective Function

    JSON →

    Incremental revenue per session from different ranking objective functions on an e-commerce search result page. Revenue-weighted composite (relevance × margin × projected LTV) outperforms pure relevance ranking by 23% in GMV per session, with neutral effect on relevance perception.

    Sample size
    ~14.2M search sessions, 4 variants
    Collected
    2024-08/2024-10
    License
    CC-BY-4.0 for cited figures

    Used in · search-ranking-revenue-optimization-l2r

  • Transformer Product Embeddings — CTR Lift vs Collaborative Filtering

    JSON →

    CTR and downstream conversion lift from replacing a matrix-factorization collaborative filter with transformer-based session embeddings (BERT4Rec-style). Transformer embeddings lift CTR by 18–32% across cold-start, returning-user, and category-diverse segments.

    Sample size
    ~6.2M users, 4 segments
    Collected
    2024-10/2024-12
    License
    CC-BY-4.0 for cited figures

    Used in · transformer-product-embeddings-ecommerce · cold-start-problem-few-shot-learning

  • Uplift Modeling — Persuadable Share by Customer Segment

    JSON →

    Share of customers falling into each of the four uplift quadrants (sure-thing, persuadable, lost-cause, do-not-disturb) for a promotional email campaign, decomposed by customer segment. Only 18% of the audience is genuinely persuadable; 64% of promotional budget is historically wasted on the other three groups.

    Sample size
    ~1.6M customers, 4-segment decomposition
    Collected
    2024-Q3/Q4
    License
    CC-BY-4.0 for cited figures

    Used in · personalized-promotion-uplift-modeling

  • Cost per Attention Second by Media Format

    JSON →

    CPAS (cost per attention second) computed across 12 digital and traditional media formats from eye-tracking and dwell-inferred attention data. Display banners — the cheapest format on CPM — are the most expensive on attention. Connected-TV and audio invert the traditional CPM-based ROI ranking.

    Sample size
    ~120M measured impressions across 7 studies
    Collected
    2022-03/2024-11
    License
    CC-BY-4.0 for cited figures

    Used in · attention-economics-cognitive-load-advertising

  • Creative Fatigue Decay by Impression Band

    JSON →

    Relative response (click-through and post-click conversion) as the same creative is shown repeatedly to the same audience, segmented by audience-frequency decile. Fatigue onset is earlier than industry convention assumes — entropy-based detection flags decay 2–4 weeks before CTR collapse.

    Sample size
    ~3.8B impressions, 14 campaigns
    Collected
    2024-Q2/2025-Q1
    License
    CC-BY-4.0 for cited figures

    Used in · creative-fatigue-detection-entropy-metrics

  • Content Moat — Traffic per Article as Archive Grows

    JSON →

    Traffic per article as a niche content archive grows from 1 to 200+ articles. Per-article traffic COMPOUNDS with archive size (network effect via internal linking + topical authority), not flat-linear — 50th article gets 2.8× the traffic of the 1st article for identical quality.

    Sample size
    8 sites, 1,420 articles tracked
    Collected
    2022-01/2025-01
    License
    CC-BY-4.0 for cited figures

    Used in · compounding-advantage-content-moats-seo