Event Taxonomy Design as Data Engineering

TL;DR: Most product event taxonomies are designed by marketing or growth teams in the first six months of a product's life, debated for an afternoon, encoded in a Google Sheet, and then drift for the next three years. The result is a tracking plan in which the same logical action fires under three different event names across web, iOS, and Android, payload shapes vary by SDK, PII leaks into properties that were never meant to carry it, and every downstream consumer (analytics, CDPs, attribution, ML features) is reasoning over an unreliable substrate. Treating event taxonomy as a data-engineering problem (with naming conventions, an entity-event split, PII boundaries enforced at the schema layer, additive-versus-breaking versioning, and SDK-side validation at write time) is the discipline that holds taxonomies together for a decade. This essay maps the design decisions, the failure modes, and the practitioner patterns from Snowplow, Segment, Mixpanel, Amplitude, and the broader analytics-engineering literature.

A note on the named vendors. Segment, Mixpanel, Amplitude, Iterable, Snowplow, and the broader analytics-engineering ecosystem appear throughout as widely-cited reference points, not as data sources. The drift-rate figures, schema-validation effort estimates, and taxonomy-revision costs come from advisory engagements with anonymized partner operators across SaaS, marketplace, and consumer-app archetypes, and are presented as observed ranges rather than vendor benchmarks. Where a specific vendor capability is described, it is sourced to the public documentation.

Why Event Taxonomy Is a Schema Problem, Not a Marketing Problem

The pattern is recognizable across most product organizations. The first version of the tracking plan is drafted by a growth product manager or a marketing analyst. The document is a list of events with English-language descriptions: "User signs up." "User completes onboarding." "User adds item to cart." The events are organized by funnel stage rather than by data structure. The payloads are described loosely ("include the plan tier and the source") rather than with strict types. The tracking plan is then handed to the engineering team, who implement each event in roughly the spelling and shape that the document specified, with minor variations dictated by what is easy to log from each platform's SDK.

Six months later, the tracking plan and the actual event stream have diverged in three distinct ways. First, the same logical event has different names across surfaces: user_signed_up on web, User Signed Up on iOS, signup_completed on Android. Second, the same event has different payload shapes: the web version sends a plan_tier property; the iOS version sends plan; the Android version omits the property when the plan is free. Third, events that nobody documented are also firing because engineers added instrumentation that the analytics team was not consulted on, and because the marketing team requested ad-hoc events directly from product engineers via Slack messages that never made it back into the tracking plan.

The downstream cost compounds. The funnel report shows a 12 percent drop between two steps that is actually a tracking artifact. The CDP segment for "users who signed up in the last 30 days" undercounts on Android because the property is missing. The ML feature for plan-tier mix has nulls where the underlying user is on the free plan, and the model treats nulls as a separate category. The attribution model deduplicates conversions by event name and double-counts users who hit both user_signed_up and User Signed Up in the same session. The analytics team spends a third of their time debugging numbers and two-thirds explaining the discrepancies to stakeholders.

The reframe is that the tracking plan is the user-facing document and the event taxonomy is the underlying schema. The schema has the same structural requirements as any other data schema: stable naming, typed payloads, version-controlled evolution, validation at write time, observable drift, deprecation lifecycle. Treating the taxonomy as a marketing artifact (a list of "things we want to know") rather than as a data-engineering artifact (a schema with constraints) is the structural error that produces every downstream problem.

Naming Conventions: Object-Action-Context vs Verb-Noun

Two dominant naming conventions exist in practice, and the choice between them shapes everything that follows. The first is verb-noun: signed_up, completed_onboarding, added_to_cart. This is the convention that Mixpanel and most early product-analytics tools encouraged, and it reads cleanly in English. The second is object-action-context: account.created, onboarding.completed, cart.item_added. This is the convention that Segment's Spec, Snowplow's self-describing schemas, and most data-warehouse-first organizations have converged on.

The verb-noun convention reads better. The object-action-context convention scales better. The difference becomes visible the moment the catalog reaches roughly 50 events. With verb-noun, the team has signed_up, signed_in, signed_out, subscribed, unsubscribed, clicked_signup, viewed_signup_page, and the analytics team is doing string-search on event names to find related events. With object-action-context, the team has account.created, session.started, session.ended, subscription.started, subscription.cancelled, and the events sort, group, and filter naturally by object.

The deeper advantage of object-action-context is that it forces the team to name the entities explicitly. Verb-noun events tend to drift toward UI-derived names (clicked_blue_button) because there is no structural pressure to name the underlying object. Object-action-context events fail loudly when the object is undefined: if the team cannot name the object, the event probably should not exist, or it belongs to an object the team has not yet modeled.

Event Naming Convention Comparison, Practitioner Observations

Convention	Reads Well at 20 Events	Reads Well at 200 Events	Forces Entity Naming	Common Vendors
Verb-noun (signed_up, added_to_cart)	Yes	No: event list becomes hard to navigate	No: UI-derived names creep in	Mixpanel, early Amplitude
Verb_noun_context (signed_up_web, added_to_cart_mobile)	Yes, but verbose	Partly: groupings emerge but conventions vary	Weakly	Heap, some Amplitude
Object.action (account.created, cart.item_added)	Slightly more formal	Yes: natural grouping and filtering	Yes: object must be named	Segment Spec, Snowplow, modern Amplitude
Object.action.context (account.created.organic, cart.item_added.promo)	Verbose for small catalogs	Yes, with strong discipline	Yes, and context adds dimension without payload bloat	Snowplow, large-org deployments
CamelCase + Title (Account Created, Cart Item Added)	Yes	Partly	Weakly	Segment Spec (display names), some Iterable

The case for object-action-context strengthens as the team gets larger and as more downstream consumers join. A small product with three engineers and one analyst can survive on verb-noun. A product with twenty engineers, three analysts, two ML teams, a CDP, and an external attribution vendor cannot. The choice should be made early, not because it is irreversible (renaming events is possible, with discipline) but because the cost of renaming 200 events six months in is much higher than the cost of starting with the right convention.

The Segment Spec, which is one of the most widely-adopted conventions, uses Title-Case event names (Account Created, Order Completed) with snake_case property names. This is a third hybrid that captures most of the object-action benefits while remaining readable. The honest read is that the specific casing matters less than the structural pattern: object first, action second, with the team's discipline to add no events that cannot be decomposed into an object and an action.

The Entity-Event Split

Underneath any well-formed event taxonomy is an entity model. The entity model names the objects in the product's domain (account, user, organization, project, document, subscription, payment, item, cart, session) and assigns each one a stable identifier. Events relate to entities: an account.created event has an account_id; an order.completed event has an order_id, a cart_id, and a user_id. The events carry the changes; the entities carry the state.

The entity-event split is the analytics-engineering equivalent of the Kimball star schema in traditional data warehousing. The events are the fact tables; the entities are the dimensions. The downstream warehouse tables (typically organized in dbt as staging, intermediate, and marts) materialize the entity state by aggregating events, and the marts join entities to other entities for reporting.

The teams that get this split right have a small number of well-defined entities (often 5 to 12) and a much larger number of events (typically 60 to 300). Each event has a clear primary entity (the entity it changes) and zero or more secondary entities (the entities it references). Properties on the event are either intrinsic to the event itself (the timestamp, the source, the event ID) or are denormalized snapshots of entity state at the moment of the event (the plan tier at the moment of subscription.upgraded, captured because the entity state may have changed by the time the warehouse aggregates).

The entity-event split and its warehouse materialization

Loading diagram...

The most common mistake we see is the team treating every event as standalone rather than as an event-on-an-entity. The symptom is events with names like submitted_form (which form? on what entity?) and clicked_button (which button? on what page? on what entity?). These events are unfilterable without joining to a different event for context, which means every downstream query is a self-join and the warehouse becomes slow and confusing. Events that name their entity explicitly (signup_form.submitted on the account entity, cta.clicked on the pricing_page entity) avoid this entirely.

The entity-event split also makes PII boundaries enforceable, which is the topic of the next section.

PII Boundaries Baked Into the Schema

The teams that retrofit PII handling onto an event taxonomy after the fact almost always discover PII in places they did not expect. An account.updated event includes a free-text notes field that engineers thought would be empty but is now full of customer-support text. A support_ticket.created event has a description property with email addresses inside. A search.performed event has a query property that occasionally contains the user's name because the user typed it. The retrofit is expensive: every event needs to be re-audited, the warehouse needs to be re-cleaned, the downstream exports to ad platforms need to be quarantined and refiltered.

The teams that bake PII boundaries into the schema avoid this by making the boundary visible at the property level. Every property in the schema carries a sensitivity tag: public (any consumer), pseudonymous (consumers who have signed a DPA), pii (consumers with explicit purpose authorization), or forbidden (do not log). The SDK validates the tag at write time and refuses to send any property tagged forbidden. The collector rejects payloads where a property's actual value is structurally inconsistent with its tag (an email regex matching in a property tagged public).

The mechanism is simple and the operational benefit is large. The downstream exports to ad platforms read the schema metadata and filter out anything above their authorized sensitivity tier. The data scientists working on look-alike modeling get a pseudonymous view of the events. The CS team working on ticket triage gets a pii view. The vendor integrations with limited DPAs get a public view only. Each consumer reads from a view of the data that is appropriate to their authorization, and the schema, not a human, enforces the boundary.

PII Sensitivity Tag Examples for Common Event Properties

Property	Typical Value	Sensitivity Tag	Downstream Filter
user_id (internal)	usr_a3f9c1b8e2	pseudonymous	Most consumers; hash before export to third parties
email_hash (SHA-256)	f3a8...	pseudonymous	Most consumers; allowed for CAPI
email (raw)	user@example.com	pii	Authorized internal only; never to vendors
plan_tier	pro	public	All consumers
ip_address	203.0.113.42	pseudonymous	Truncated to /24 for most consumers
device_id	ABC123-...	pseudonymous	Most consumers; not exported raw
search_query	(free text)	pii (variable)	PII-scrubbing pipeline; pattern-based filter
support_notes	(free text)	pii	Authorized internal only; full DLP scan
referrer_url	https://...	pseudonymous	Strip query params with email or token patterns
referrer_domain	google.com	public	All consumers

The pattern that scales is to make the sensitivity tag a required field in the schema definition itself. A property added without a tag is rejected at schema-review time. A property tagged public that later starts carrying PII (because a feature changed) fails the collector's runtime validation and surfaces the issue in monitoring before it leaks. The cost of building this infrastructure is real (a few weeks of engineering for the validation layer, plus the schema-review discipline), and the cost of not building it is the eventual PII-leak incident that costs a multiple of that.

GDPR Article 5(1)(c) on data minimization and the corresponding CCPA provisions establish that data collection should be limited to what is necessary for the stated purpose. The schema-tagged approach is the operating mechanism for this principle: the schema declares what each property is for and what sensitivity it carries, and the validation layer enforces that the actual data matches the declaration. The compliance argument is downstream of the engineering argument.

Versioning: Additive vs Breaking, and Why Semver Helps

Once an event taxonomy is in production, every change to it is a schema migration. Some migrations are additive (adding a new optional property; adding a new event) and some are breaking (renaming an existing property; changing a property's type; removing an event). The two require different change-control disciplines, and conflating them is one of the most common ways that taxonomies decay.

The convention that has emerged across Snowplow and other rigorous implementations is semantic versioning at the event level. An event has a version like 1-0-0 (major-minor-patch). A new optional property bumps the patch version (1-0-1). A new required property bumps the minor version (1-1-0). Renaming a property, changing a type, or removing a property bumps the major version (2-0-0) and creates a new event spec that downstream consumers must opt into.

The advantage of this convention is that consumers can declare which major version they support, and the analytics platform can route events to the right consumer based on the version. The CDP that was built against account.created v1 continues to receive v1 events; the new attribution model that needs the additional property in account.created v2 receives v2 events; both versions are emitted in parallel during the migration window. The change is staged and observable, not a flag day.

The cost is operational: the team must maintain multiple versions in parallel, the schema registry must support versioning, and the SDK must respect the version when emitting events. This is the same discipline that any well-run API has (REST API versioning, GraphQL schema evolution, Protocol Buffer field number conventions), and the same arguments apply. The teams that build it once amortize the cost across years of taxonomy evolution. The teams that avoid it accumulate breaking changes that, eventually, force a global rewrite.

Event Taxonomy Drift Rate by Versioning Discipline, Across Partner Operators (2022-2024)

The drift-rate chart shows the divergence we observe in advisory engagements: a fully versioned taxonomy (every event has a major-minor-patch version, every change goes through schema review, deprecations have explicit sunset dates) drifts at about 6 to 9 percent per year, mostly from genuinely new events. An unversioned taxonomy (events are added freely, properties are renamed in-place, deprecated events are not removed) drifts at 30 to 60 percent over two years, with most of the drift being dead events that nobody removes and silent renames that nobody catches. A semi-versioned taxonomy (versioning exists but is not enforced) sits between, closer to unversioned than to versioned, because the absence of enforcement is the structural problem.

Contrary to the Conventional View

Conventional view

The right way to keep an event taxonomy clean is for the analytics team to be careful and disciplined.

What the evidence shows

Discipline does not scale. Twenty engineers, three product managers, and a marketing team will overwhelm any analyst's discipline within six months. The mechanism that scales is automation: schema validation at the SDK layer, collector-side rejection of malformed payloads, a schema registry that requires pull-request review for changes, and a deprecation pipeline that enforces sunset dates. The teams that succeed treat the taxonomy as code (with the same review, testing, and deployment discipline as the application code) rather than as documentation that humans are supposed to keep in sync. Documentation always drifts; code with tests does not.

Tracking Plans as Living Documents

The tracking plan, as distinct from the taxonomy, is the human-readable projection of the taxonomy onto product surfaces. It answers questions like "where does this event fire" and "what is the user trying to accomplish when this event fires," questions that the schema itself does not capture. Most tracking plans we have seen start as a Google Sheet, evolve into a Notion or Confluence document, and eventually become an instance of Iteratively (now part of Amplitude), Avo, Mixpanel Lexicon, or a custom tool.

The tooling matters less than the operating discipline. A tracking plan that survives a year has three properties. First, it is the single source of truth: the schema in code, the SDK validation, and the warehouse documentation are all generated from or validated against the tracking plan, not maintained in parallel. Second, it has clear ownership: every event has a named owner (a team or a person) who is on the hook for its accuracy, and a contact for downstream consumers who have questions. Third, it has a change-control process: adding an event, changing a property, or deprecating an event requires a review, a pull request, and an explicit approval; ad-hoc additions are not allowed.

The teams that violate any of these three properties tend to discover the cost six to eighteen months later, when the tracking plan in the document has diverged from the tracking plan in the code, when no one knows who owns the legacy_signup_event_v2 that fires 4 percent of total events, and when nobody can find the changelog for the property rename that broke the attribution model in May.

The mature pattern, used by larger analytics organizations, is to maintain the tracking plan in a versioned repository (often a tracking-plan repo with YAML or JSON schema files), generate the SDK type definitions from it (so that mistyped event calls are compile-time errors in TypeScript or Swift), validate at the collector against the same schema, and auto-publish the human-readable documentation from the schema repository. The cost is a few engineering weeks of toolchain investment. The benefit is that the tracking plan stops drifting.

SDK-Side Validation at Write Time

The cheapest place to catch a tracking error is in the SDK, before the event is sent. The next-cheapest is at the collector, before the event is persisted. The most expensive is in the warehouse, after the event has been logged with the wrong shape and downstream consumers have been reading it for weeks.

SDK-side validation is the discipline of treating the tracking plan as a typed contract that the SDK enforces. In TypeScript, this looks like a generated type definition that requires the calling code to pass the correct properties with the correct types: track('account.created', { account_id: string, plan_tier: 'free' | 'pro' | 'enterprise', source: string }). A call that passes the wrong property name or the wrong type is a compile-time error, caught during code review rather than in production. The same pattern works in Swift, Kotlin, Go, and most typed languages.

In untyped languages (the JavaScript-without-TypeScript case, or Python), the equivalent is runtime validation: the SDK reads the schema at startup, checks each track call against the schema, and throws (in development) or logs an error and drops the event (in production) when validation fails. The development-vs-production distinction is important: throwing in production breaks the user experience, but silently logging in development hides errors that should have been caught.

Snowplow's iglu schema registry and the corresponding JSON-Schema-based validation is the most rigorous public implementation of this pattern. The collector reads the schema from iglu, validates every event payload against the schema, and routes failing events to a separate "bad events" stream that the analytics team can monitor. The cost is the schema registry infrastructure and the discipline of registering schemas before events go live. The benefit is that the events that land in the warehouse are guaranteed to match the schema, which is a substantial reduction in downstream cleaning work.

Validation Strategy Comparison by SDK and Language, Practitioner Patterns

Strategy	Where Errors Are Caught	Setup Effort	Runtime Cost	Best For
Typed SDK (TypeScript, Swift, Kotlin)	Compile time	Medium: code generation from schema	Zero: types vanish at runtime	Most product teams
Runtime SDK validation (JS, Python)	Dev: throw; Prod: log + drop	Low: schema fetch + JSON validation	Low: per-event JSON-Schema check	JS-only stacks; legacy
Collector-side validation only	Server-side, after network round-trip	Low to medium: collector configuration	Low: at ingest	Bad SDK options or legacy clients
Warehouse-side dbt tests	Post-load, often hours after the fact	Medium: dbt test authoring	Low: scheduled test runs	Catching drift that earlier layers missed
No validation (the default)	When the dashboard breaks	Zero	Zero, until the cost surfaces downstream	Not recommended at any scale

The teams that combine typed SDKs (where the language allows) with collector-side validation get two layers of defense and catch the vast majority of taxonomy violations before any downstream consumer reads the bad data. The teams that rely only on warehouse-side dbt tests catch the errors days or weeks later, by which time the bad data is already cached in downstream systems and the cleanup is correspondingly more expensive.

The Cost of Taxonomy Drift

The cumulative cost of an unmanaged taxonomy is hard to measure in any single quarter and obvious in retrospect over years. The drift surfaces as a tax on every analytical question: "what does this event actually mean," "why does this property have nulls in 30 percent of rows," "why do these two events report different numbers for what should be the same thing." The analytics team spends progressively more time on disambiguation and less on analysis, and the product organization slowly loses trust in the data.

The mechanism by which trust erodes is specific and predictable. A senior leader asks a question. The analytics team produces an answer. A second analyst, asked the same question, produces a different answer because they made different choices about which events to filter in, which to filter out, and which to deduplicate. The leader notices the discrepancy and concludes (correctly, given the evidence visible to them) that the data is unreliable. The next request is qualified with "and can you confirm this with three different cuts," which doubles the work and makes the team slower. Eventually the leader stops asking and goes back to intuition, which is the worst possible outcome for an analytics function.

The cost of preventing the drift is the engineering investment in schema validation, versioning, and tracking-plan automation: typically 4 to 12 engineer-weeks for the initial buildout in a mid-market organization, plus an ongoing maintenance load of 2 to 4 hours per week for schema review and deprecation management. The cost of allowing the drift is the analytics team's time spent disambiguating, plus the periodic taxonomy rewrites (we have observed full rewrites every 18 to 36 months at organizations without versioning discipline, each rewrite consuming 8 to 20 engineer-weeks plus significant analyst time), plus the harder-to-measure cost of the trust erosion.

Patterns From Snowplow, Segment, Mixpanel, Amplitude, and Iterable

Each of the major analytics-platform vendors has converged on a slightly different version of the same underlying ideas, and looking across them shows the pattern by triangulation.

Snowplow is the most rigorous end of the spectrum. Every event is a self-describing JSON Schema, registered in an iglu schema registry, validated at the collector, and routed to a typed warehouse table. The discipline is high and the operational overhead is real. Snowplow's documentation on data structures reads like a database-design textbook rather than a marketing guide, which is the right tone for the audience.

Segment sits in the middle. The Segment Spec defines canonical events (Track, Identify, Group, Page, Screen) with conventional property names, and Protocols is the schema-validation add-on that catches violations at the integration layer. The default experience is more forgiving than Snowplow; the optional Protocols layer brings the validation discipline back.

Mixpanel has historically been more flexible than either, with Mixpanel Lexicon as the tracking-plan governance layer added more recently. The Mixpanel default is closer to verb-noun naming and the property model is more permissive, which is fine for smaller teams and increasingly painful as the team grows.

Amplitude has converged closer to Segment's pattern, with Iteratively (acquired in 2021) as the typed-SDK and tracking-plan tooling. Amplitude's Data Taxonomy documentation is one of the more accessible practitioner guides.

Iterable is positioned as the customer-engagement platform rather than a product analytics tool, but the event-data discipline is similar. The Iterable event schema documentation emphasizes payload typing and event grouping in similar ways.

Vendor Validation Strictness vs Setup Effort, Across Practitioner Engagements

The pattern across vendors is that the validation strictness is correlated with the setup effort, and there is no free lunch. A team that wants Snowplow-level rigor pays for it in operational overhead. A team that wants Mixpanel-style flexibility pays for it in eventual drift. The honest choice depends on the team's analytical maturity and the cost of downstream errors. Teams that report financial numbers from event data need higher strictness; teams running product-discovery analytics can tolerate lower strictness; the choice is not absolute and should be revisited as the team grows.

A Twelve-Month Implementation Order

For a team starting from a typical Google-Sheet-and-engineering-Slack-DMs tracking plan and aiming for a versioned, validated, properly-owned taxonomy, the order of investment matters. The pieces interact, and a partial implementation is often worse than no implementation (because it creates the appearance of rigor without the substance).

Inventory and decay audit. Catalog every event currently firing, count by volume over the last 90 days, and identify the events that are no longer used. This is the unglamorous starting work that surfaces the actual scale of the cleanup.
Naming-convention decision and entity model. Pick the naming convention (object-action-context recommended for any team above 30 events), name the 5 to 12 canonical entities, and map every existing event to an entity. Events that do not map cleanly are flagged for renaming or removal.
PII tagging. For every property on every event, assign a sensitivity tag. This is where most teams discover PII in unexpected places. The output is a tagged schema document.
Schema repository and validation. Move the tracking plan from the document tool to a versioned repository (YAML or JSON). Wire SDK-side type generation for typed languages, runtime validation for untyped languages, and collector-side validation as a backstop.
Versioning and change-control process. Pull-request review for schema changes, semantic versioning, deprecation lifecycle with explicit sunset dates. The discipline that prevents drift.
Migration of existing events to the new schema. One entity at a time, with parallel emission of old and new events during transition. Downstream consumers migrate at their own pace within the transition window.
Sunset of legacy events. After the transition window (typically 90 to 180 days), remove the old event names from the schema and from active emission. Keep them in the schema registry as deprecated, so that historical data remains queryable.
Tracking-plan documentation auto-generation. Wire the documentation site to the schema repository, so that the human-readable tracking plan is always in sync with the schema. The document stops drifting because there is no human in the loop.
Ongoing review cadence. Quarterly schema review, monthly deprecation review, weekly bad-events monitoring. The maintenance load that keeps the system honest.

The full implementation takes nine to fifteen months for a mid-market analytics organization with one to three analytics engineers and supportive product-engineering partners. The teams that move faster than this usually skip the inventory or the PII tagging, and the cost surfaces six months later when the migration runs into events that were missed or PII that was not flagged. The teams that move slower than this usually do not have organizational commitment to the work, and the project stalls in step three or four.

The taxonomy is the schema. The tracking plan is the document. The instrumentation is the implementation. The three are different artifacts, and the team that conflates them is the team whose data does not survive the next reorganization.

The taxonomy that survives a decade is not the taxonomy that was perfectly designed on day one. It is the taxonomy that was designed well enough on day one and was then maintained with the discipline of a software project: version control, code review, testing, deprecation, documentation generated from source. That discipline is what separates analytics functions that scale from analytics functions that periodically collapse and start over.

Key Takeaways

Event taxonomies are schema problems, not marketing problems. The teams that treat the taxonomy as a data-engineering artifact (with naming conventions, typed payloads, versioning, validation) avoid the drift that catches teams who treat it as a Google Sheet of marketing requirements.
The object-action-context naming convention (account.created, cart.item_added) scales better than verb-noun (signed_up, added_to_cart) past about 50 events, because it forces explicit entity naming and produces natural grouping. The Segment Spec's Title-Case variant is a workable hybrid.
The entity-event split is the analytics-engineering equivalent of the star schema. Events are facts, entities are dimensions, and well-formed events name the entity they change and the entities they reference. Events that name no entity (submitted_form, clicked_button) are unfilterable without self-joins.
PII boundaries belong in the schema. Every property carries a sensitivity tag, and the validation layer enforces the boundary at write time. Retrofitting PII handling after the fact almost always discovers leaks in places nobody expected.
Semantic versioning at the event level (major-minor-patch) separates additive from breaking changes and enables parallel emission during migrations. The teams that adopt versioning early drift at 6 to 9 percent per year; the teams that do not drift at 30 to 60 percent over two years in advisory data we have observed.
The tracking plan should be the single source of truth, automatically synchronized with the SDK validation and the warehouse documentation. The mature pattern is a versioned schema repository with code-generated type definitions and auto-generated human-readable documentation.
SDK-side validation is the cheapest place to catch errors. Typed SDKs (TypeScript, Swift, Kotlin) catch tracking mistakes at compile time; runtime validation catches them in development. Warehouse-side dbt tests are a useful backstop, not a primary defense.
The vendor landscape sits on a strictness-versus-setup-effort curve, with Snowplow at the rigorous end and stock Mixpanel at the flexible end. The choice depends on the team's analytical maturity and the cost of downstream errors; financial-reporting teams need higher strictness than product-discovery teams.
The full implementation from "Google Sheet tracking plan" to "versioned, validated, owned taxonomy" takes nine to fifteen months for a mid-market organization. The teams that skip the inventory or PII tagging discover the cost six months later when the migration hits the events they missed.