The data

Anonymized partner data, citable and open.

17 datasets, aggregated under NDA from operating-company partners, de-identified, and published as structured JSON with methodology, sample size, and license.

How the datasets are sourced

Each dataset in this catalog is aggregated from an operating-company partner that agreed to share raw data with me under a non-disclosure agreement, on the condition that what gets published is de-identified. “De-identified” here means: the company name is not disclosed, any row-level IDs are stripped or hashed, and the smallest aggregation that still answers the analytic question is the one you see.

The raw rows remain with the originating company. What is published on this page is the aggregation, the methodology, and the resulting statistics, enough to audit the argument in the essays that cite the dataset, not enough to re-identify a source. This is the same arrangement academic economists use when publishing on IRS, census, or proprietary administrative data.

If you want to cite a dataset and need methodology detail the page doesn't list, reach out on LinkedIn. The partner will often approve additional disclosure on request.

Behavioral Economics3 datasets

Loss Aversion Ratios by Stake Level
JSON →
Empirical loss aversion coefficient λ observed in marketplace pricing experiments, decomposed by transaction stake level. Shows that λ is not constant (textbook 2.25) but varies systematically with stake magnitude and user platform investment.
Sample size
~14.1M total observations across buckets
Collected
2024-06/2025-05
License
CC-BY-4.0 for cited figures
Used in · loss-aversion-asymmetry-digital-marketplaces
Churn Windows by Discount-Type Subscriber Cohort
JSON →
Observed cancellation concentration around billing dates across subscription cohorts, decomposed by estimated beta (present-bias) parameter. Shows that 52-68% of annual churn events in consumer subscriptions occur within 7 days of a billing date, consistent with the beta-delta hyperbolic discounting model's prediction of billing-day regret.
Sample size
~2.4M subscriber-months
Collected
2022-01/2024-12
License
CC-BY-4.0 for aggregate figures
Used in · hyperbolic-discounting-subscription-churn
Ladder-Up vs Ladder-Down SaaS Pricing Conversion
JSON →
Direct A/B test of ladder-up (start on free or starter, prompt to upgrade) vs ladder-down (start on premium trial, prompt to downgrade) pricing paths across 9 SaaS products. Ladder-down converts 31-58% more paying users, driven by endowment-effect-induced resistance to losing premium features.
Sample size
~182K signups across 9 products
Collected
2024-Q2/Q3
License
CC-BY-4.0 for aggregate figures
Used in · endowment-effect-saas-pricing

Digital Economics2 datasets

Platform Entry Threshold by Complementor Category Share
JSON →
Observed relationship between a complementor category's share of platform transaction volume and the probability that the platform enters the category within 24 months. Entry becomes likely once a category exceeds ~8% of platform volume.
Sample size
412 categories across 4 platforms
Collected
2018-2024
License
CC-BY-4.0 for cited figures
Used in · platform-cannibalization-dynamics
Vertical SaaS Market Concentration and Multi-Homing
JSON →
Market concentration (top-3 share) and supplier-side multi-homing rates across 40 vertical SaaS markets. Winner-take-most is the exception, not the rule: only 22% of vertical markets exhibit top-3 share above 70%, and those markets also show low multi-homing.
Sample size
40 vertical markets
Collected
2024
License
CC-BY-4.0 for synthesis
Used in · winner-take-most-multi-homing-vertical-saas · two-sided-network-effects-dead

Marketing Engineering3 datasets

MTA Reported ROAS vs Experimental (Incrementality) ROAS
JSON →
Side-by-side comparison of ROAS reported by multi-touch attribution systems versus ROAS estimated via randomized geo-lift experiments for the same channels and periods. MTA systematically overstates ROAS by 2.4-6.5x, with the gap widest for retargeting and display.
Sample size
6 published studies, 22 channel-study combinations
Collected
2015-2024
License
CC-BY-4.0 for synthesis
Used in · multi-touch-attribution-causal-inference-dag · unified-measurement-architecture-mmm-mta-experimentation
Bayesian MMM, Channel Saturation and Adstock Parameters
JSON →
Posterior estimates of adstock half-life and saturation parameters (Hill function) for eight paid-media channels from a privacy-first Bayesian marketing mix model. Reveals that TV has the longest decay (12-week half-life) while search has the shortest (under 1 week).
Sample size
156 weeks × 180 DMAs × 8 channels
Collected
2022-01/2024-12
License
CC-BY-4.0 for cited figures
Used in · marketing-mix-modeling-privacy-first-era · unified-measurement-architecture-mmm-mta-experimentation
CausalImpact Lift from a B2B Content Program
JSON →
Bayesian structural time series (CausalImpact) estimate of the causal lift on organic traffic from launching a dedicated 36-article B2B content program. Non-branded organic captures only 38% of total SEO impact; the remaining 62% flows through branded search and direct traffic.
Sample size
104 weeks, 8 control variables, 6 outcomes
Collected
2023-Q1/2025-Q1
License
CC-BY-4.0 for cited figures
Used in · causal-impact-seo-branded-search · compounding-advantage-content-moats-seo

Business Analytics3 datasets

Cohort LTV/CAC and Payback by Acquisition Channel
JSON →
Acquisition-cohort unit economics for a consumer SaaS business, decomposed by channel. Exposes the aggregation fallacy: the rolled-up 3.1x LTV/CAC hides a channel portfolio with individual ratios ranging from 0.8x (brand-misaligned display) to 8.4x (organic referral), with very different payback profiles.
Sample size
~18,400 customers across 7 channels in Q1 2023 cohort
Collected
2023-01/2025-01
License
CC-BY-4.0 for cited figures
Used in · cohort-based-unit-economics · clv-control-variable-bid-strategies
Test Duration Reduction from Bayesian vs Frequentist A/B Testing
JSON →
Head-to-head comparison of decision latency between Bayesian posterior-probability testing and classical frequentist fixed-sample testing across 48 production experiments. Median time-to-decision dropped 36% under Bayesian methodology with no increase in downstream product regret.
Sample size
48 experiments, ~29M visitor-sessions total
Collected
2024-Q1/2025-Q1
License
CC-BY-4.0 for aggregate figures
Used in · bayesian-ab-testing-practice
Cox Proportional Hazards, SaaS Churn Covariates
JSON →
Fitted hazard ratios for ten covariates on 18-month SaaS subscriber survival. Feature usage depth and onboarding completion dominate (hazard ratios 0.34 and 0.41 respectively); price tier and annual billing have smaller but significant effects. Shows that churn is primarily a product-engagement phenomenon, not a pricing phenomenon.
Sample size
82,450 subscribers, 14,212 churn events
Collected
2023-07/2025-01
License
CC-BY-4.0 for cited figures
Used in · survival-analysis-subscription-businesses

E-commerce ML3 datasets

Learning-to-Rank Revenue Lift by Objective Function
JSON →
Incremental revenue per session from different ranking objective functions on an e-commerce search result page. Revenue-weighted composite (relevance × margin × projected LTV) outperforms pure relevance ranking by 23% in GMV per session, with neutral effect on relevance perception.
Sample size
~14.2M search sessions, 4 variants
Collected
2024-08/2024-10
License
CC-BY-4.0 for cited figures
Used in · search-ranking-revenue-optimization-l2r
Transformer Product Embeddings, CTR Lift vs Collaborative Filtering
JSON →
CTR and downstream conversion lift from replacing a matrix-factorization collaborative filter with transformer-based session embeddings (BERT4Rec-style). Transformer embeddings lift CTR by 18-32% across cold-start, returning-user, and category-diverse segments.
Sample size
~6.2M users, 4 segments
Collected
2024-10/2024-12
License
CC-BY-4.0 for cited figures
Used in · transformer-product-embeddings-ecommerce · cold-start-problem-few-shot-learning
Uplift Modeling, Persuadable Share by Customer Segment
JSON →
Share of customers falling into each of the four uplift quadrants (sure-thing, persuadable, lost-cause, do-not-disturb) for a promotional email campaign, decomposed by customer segment. Only 18% of the audience is genuinely persuadable; 64% of promotional budget is historically wasted on the other three groups.
Sample size
~1.6M customers, 4-segment decomposition
Collected
2024-Q3/Q4
License
CC-BY-4.0 for cited figures
Used in · personalized-promotion-uplift-modeling

Marketing Strategy3 datasets

Cost per Attention Second by Media Format
JSON →
CPAS (cost per attention second) computed across 12 digital and traditional media formats from eye-tracking and dwell-inferred attention data. Display banners, the cheapest format on CPM, are the most expensive on attention. Connected-TV and audio invert the traditional CPM-based ROI ranking.
Sample size
~120M measured impressions across 7 studies
Collected
2022-03/2024-11
License
CC-BY-4.0 for cited figures
Used in · attention-economics-cognitive-load-advertising
Creative Fatigue Decay by Impression Band
JSON →
Relative response (click-through and post-click conversion) as the same creative is shown repeatedly to the same audience, segmented by audience-frequency decile. Fatigue onset is earlier than industry convention assumes, entropy-based detection flags decay 2-4 weeks before CTR collapse.
Sample size
~3.8B impressions, 14 campaigns
Collected
2024-Q2/2025-Q1
License
CC-BY-4.0 for cited figures
Used in · creative-fatigue-detection-entropy-metrics
Content Moat, Traffic per Article as Archive Grows
JSON →
Traffic per article as a niche content archive grows from 1 to 200+ articles. Per-article traffic COMPOUNDS with archive size (network effect via internal linking + topical authority), not flat-linear, 50th article gets 2.8× the traffic of the 1st article for identical quality.
Sample size
8 sites, 1,420 articles tracked
Collected
2022-01/2025-01
License
CC-BY-4.0 for cited figures
Used in · compounding-advantage-content-moats-seo

Anonymized partner data, citable and open.

Loss Aversion Ratios by Stake Level

Churn Windows by Discount-Type Subscriber Cohort

Ladder-Up vs Ladder-Down SaaS Pricing Conversion

Platform Entry Threshold by Complementor Category Share

Vertical SaaS Market Concentration and Multi-Homing

MTA Reported ROAS vs Experimental (Incrementality) ROAS

Bayesian MMM, Channel Saturation and Adstock Parameters

CausalImpact Lift from a B2B Content Program

Cohort LTV/CAC and Payback by Acquisition Channel

Test Duration Reduction from Bayesian vs Frequentist A/B Testing

Cox Proportional Hazards, SaaS Churn Covariates

Learning-to-Rank Revenue Lift by Objective Function

Transformer Product Embeddings, CTR Lift vs Collaborative Filtering

Uplift Modeling, Persuadable Share by Customer Segment

Cost per Attention Second by Media Format

Creative Fatigue Decay by Impression Band

Content Moat, Traffic per Article as Archive Grows