Metric Ontology Design: Building a Self-Serve Analytics Layer That Doesn't Collapse Under Ambiguity

TL;DR: Ask five people in your company what "revenue" means and you will get five different numbers -- Finance, Product, Marketing, and Sales each calculate it differently. Data teams spend roughly 30% of their time answering questions about metric definitions rather than analyzing data. A formal metric ontology (semantic layer) that codifies definitions, dimensions, and business rules eliminates this ambiguity and is the prerequisite for any self-serve analytics that does not collapse under its own contradictions.

The Revenue Problem

Here is an exercise that will tell you more about your organization's data maturity than any technology audit. Walk into a room with your CFO, your VP of Product, your Head of Marketing, and your Head of Sales. Ask each of them: "What was our revenue last month?"

You will get four different numbers.

The CFO will give you GAAP-recognized revenue, net of refunds, adjusted for deferred revenue and contract modifications. The VP of Product will give you a number based on completed transactions in the product database, probably gross of refunds because the refund logic lives in a different system. The Head of Marketing will give you attributed revenue from campaigns, which double-counts users who touched multiple channels. The Head of Sales will give you closed-won bookings, which includes contracts that haven't started generating actual revenue yet.

Each person is right within their own frame of reference. Each person's number is defensible. And the organization, as a whole, has no idea what its revenue actually is.

This is not a data engineering problem. The warehouse has the data. The pipelines run. The tables are populated. This is a definition problem, a failure of shared meaning at the most fundamental level. The data is fine. The semantics are broken.

The consequences of this ambiguity are not abstract. They are measured in hours of meetings spent reconciling dashboards, in strategic decisions made on numbers that different executives interpret differently, in the slow erosion of trust that occurs when the data team produces a report and three stakeholders immediately question whether the metric was calculated "their way."

A 2023 survey by Atlan found that data teams spend roughly 30% of their time answering questions about metric definitions, not computing metrics, not building pipelines, not analyzing data, but explaining what the numbers mean and why they differ from what someone else reported. This is a key reason most organizations stall at the descriptive stage of analytics and never progress to the prescriptive stage described in dashboards to decision systems. This is an organizational tax on every analytical activity the company undertakes.

The problem scales with the organization. A ten-person startup can maintain shared metric definitions through conversation. A five-hundred-person company cannot. And a five-thousand-person company, without a formal system for metric definition, will have metric conflicts embedded in every major business process.

Figure 1: 'What Was Revenue Last Month?', Four Teams, Four Answers

The chart above is not invented for illustration. It is a composite drawn from real audit exercises across mid-stage SaaS companies. The variance between the lowest and highest number is typically 30-50%, driven entirely by definitional differences, what counts as revenue, when it counts, whether refunds are netted, whether trials are included, whether partner channel revenue is attributed.

This is the problem that metric ontology is designed to solve.

Ontology: A Word Borrowed for Good Reason

Ontology, in philosophy, is the study of what exists, the fundamental categories of being and how they relate to each other. Aristotle's Categories is arguably the first ontological framework in Western thought, an attempt to classify everything that can be said about anything into a structured taxonomy.

The word migrated into information science in the 1990s, where it took on a more specific and operational meaning: a formal specification of a shared conceptualization. Tom Gruber's 1993 definition remains the standard. An ontology, in the information science sense, is a structured representation of a domain's concepts and the relationships between them, codified in a way that machines can process and humans can agree on.

This is precisely what a metric layer needs to be.

When an analyst writes a SQL query that says SUM(amount) WHERE status = 'completed', they have made a series of implicit ontological commitments. The analytics engineering discipline -- version-controlled SQL models with testing and documentation -- provides the infrastructure to make these commitments explicit and enforceable. They have decided that "amount" is the measure. They have decided that "completed" is the relevant filter. They have decided that summation is the correct aggregation. They have decided that whatever time range the query covers is the correct grain. None of these decisions are visible in the query itself. They live in the analyst's head, and they will be different in the next analyst's head.

A metric ontology makes these commitments explicit. It takes the implicit knowledge embedded in thousands of ad hoc queries and codifies it into a formal structure that the entire organization can reference, debate, and ultimately agree upon.

The philosophical parallel is not decorative. Ontology in philosophy is concerned with resolving ambiguity about what exists and what categories things belong to. Metric ontology is concerned with resolving ambiguity about what a number means and what business concept it represents. The structure of the problem is identical.

Anatomy of a Metric: Measures, Dimensions, Filters, Time Grains

A well-designed metric ontology decomposes every business metric into four constituent elements. This decomposition is not arbitrary, it reflects the actual degrees of freedom that create ambiguity when metrics are defined informally.

Measures are the quantitative values being aggregated. Revenue, order count, session duration, conversion events. A measure has a base column in a data table, an aggregation function (sum, count, average, count distinct, min, max), and a unit of measurement. The measure is the "what."

Dimensions are the categorical attributes by which a measure can be sliced. Region, product line, customer segment, acquisition channel. Dimensions answer the question "by what?" and they must be enumerated explicitly, because not every dimension is valid for every measure. Slicing "monthly recurring revenue" by "page URL" is nonsensical, and the ontology should prevent it.

Filters are the constraints that define the scope of the metric. "Revenue" might mean all revenue, or it might mean revenue excluding refunds, or revenue from enterprise customers only, or revenue from the North American region. Filters are where most definitional conflicts hide, because they are the element most often left implicit. When these definitions are ambiguous, even well-designed experiments suffer -- Bayesian A/B testing can only produce trustworthy results when the metric being tested has an unambiguous definition that all stakeholders agree on.

Time grains define the temporal resolution of the metric. Daily, weekly, monthly, quarterly, trailing-30-day. The time grain interacts with the measure in ways that are not always obvious, "monthly revenue" computed as a calendar month sum will differ from "monthly revenue" computed as a trailing-30-day window, and both will differ from "monthly revenue" computed as an annualized run rate divided by twelve.

Table 1: The Four Components of a Metric Definition

Component	Definition	Example (Revenue)	Where Ambiguity Hides
Measure	The quantitative value being aggregated	SUM(transaction_amount)	Which column? Which aggregation? Gross or net?
Dimension	Categorical attribute for slicing	Region, Product Line, Customer Segment	Which dimensions are valid? Are they consistent across sources?
Filter	Scope constraints on the metric	WHERE status = completed AND refund = false	Implicit filters that differ by team. What counts as completed?
Time Grain	Temporal resolution	Calendar month, trailing 30 days, fiscal quarter	Calendar vs. fiscal, point-in-time vs. cumulative, timezone handling

Every metric conflict in an organization can be traced to a disagreement about one or more of these four components. The Finance team uses a different filter than the Product team. Marketing uses a different time grain than Sales. The executive dashboard uses a different aggregation than the operational report.

The purpose of a metric ontology is to make these components explicit, versioned, and governed, so that when two people refer to "revenue," they are either referring to the same formal definition or they are consciously referring to different named variants ("gross_revenue" vs. "net_revenue" vs. "recognized_revenue"), each with its own documented specification.

This is the difference between a metric being a number and a metric being a concept with a definition. The former is what you get from a SQL query. The latter is what you get from an ontology.

The "Single Source of Truth" Myth

The phrase "single source of truth" has become one of the most overused and least examined ideas in data engineering. It sounds correct. It sounds like something a mature organization should have. It is, in its naive formulation, both impossible and counterproductive.

The naive version of SSOT says: there should be one number for every metric, and everyone in the organization should use that number. This formulation collapses under the slightest contact with organizational reality.

Revenue, as we have already established, is not one thing. It is a family of related concepts that share a word. GAAP revenue is a legal and accounting construct governed by ASC 606. Bookings revenue is a sales pipeline construct. Product revenue is a transactional construct. Marketing-attributed revenue is a channel-allocation construct. These are not errors or inconsistencies, they are legitimate different perspectives on the same underlying business activity.

A metric ontology does not enforce a single source of truth. It enforces a single source of definitions. The distinction matters enormously.

In a single-source-of-definitions model, the organization maintains a canonical registry of all metric definitions. Each definition specifies its measure, dimensions, filters, and time grain. When the Finance team needs GAAP revenue, they reference the revenue_gaap_recognized metric. When the Product team needs transaction volume, they reference revenue_gross_transactions. Both metrics are defined in the same ontology, with explicit documentation of how they differ and why.

The source of truth is not the number. The source of truth is the definition. The numbers can legitimately differ, as long as the definitions are clear and the lineage is traceable.

This reframing has practical consequences. It means the data team's job is not to produce "the right number", an impossible task when different stakeholders need different numbers for legitimate reasons. The data team's job is to produce a governed set of named definitions and ensure that every dashboard, report, and analysis references one of those definitions explicitly.

When a stakeholder says "the revenue number on my dashboard is wrong," the response shifts from "let me check the pipeline" to "which revenue definition is your dashboard using, and which one do you think it should be using?" This moves the conversation from a debugging exercise to a definitional one, which is where it belonged all along.

The Semantic Layer Wars: MetricFlow, dbt Metrics, and Cube.js

The concept of a "semantic layer", a logical abstraction that sits between the data warehouse and the consumption tools, translating business concepts into SQL, is not new. Business Objects had a semantic layer in the 1990s. Looker's LookML was, in essence, a modern reimagining of the same idea. What has changed is that the semantic layer is now being treated as an independent, composable infrastructure component, decoupled from any single BI tool.

Three platforms have emerged as the primary contenders in what can reasonably be called the semantic layer wars.

MetricFlow (now part of dbt Labs) takes a metrics-first approach. Metrics are defined as YAML configurations that specify measures, dimensions, and time grains. The MetricFlow engine compiles these definitions into optimized SQL at query time. The core bet is that metrics should be defined once in the transformation layer and consumed everywhere, in BI tools, notebooks, embedded analytics, and APIs. Since the dbt Labs acquisition of Transform (the company that built MetricFlow), this has become the de facto metrics layer for dbt-centric data stacks.

dbt Metrics (the original implementation, prior to MetricFlow integration) introduced the concept of metrics-as-code within the dbt ecosystem. The early implementation had limitations, it could not handle complex derived metrics or multi-hop joins efficiently, which led to the MetricFlow acquisition. The current dbt metrics layer is MetricFlow under the hood, but the evolutionary path matters because it illustrates how the community iterated toward a solution.

Cube.js (now Cube) takes a different architectural approach. Cube positions itself as a "headless BI" platform, a semantic layer with a built-in caching and API layer that can serve metrics to any frontend. Where MetricFlow is tightly integrated with dbt and the transformation layer, Cube operates as a standalone service that connects to the warehouse and exposes metrics through REST and GraphQL APIs. The core bet is that the semantic layer should be an API, not a compile-time artifact.

Figure 2: Semantic Layer Capability Comparison (Assessment Score 1-10)

The strategic question for data teams is not "which tool is best", that depends entirely on the existing stack and organizational context. The strategic question is whether the semantic layer should live in the transformation layer (the MetricFlow bet), in a standalone service (the Cube bet), or in the BI tool itself (the legacy approach that Looker pioneered and that is now being challenged).

The trend is clearly toward extraction, pulling metric definitions out of BI tools and into a layer that multiple tools can consume. This is driven by a practical reality: organizations use multiple BI tools. The executive team uses Tableau. The product team uses Looker. The data science team uses Jupyter. The engineering team queries the warehouse directly. If metric definitions live inside any one of these tools, they are invisible to the others, and definitional drift is inevitable.

A metric ontology, properly implemented, sits beneath all of these tools and serves as the canonical reference for what every metric means, regardless of where it is consumed.

When Revenue Means Three Different Things: Handling Metric Conflicts

Metric conflicts are not bugs. They are a natural consequence of organizational specialization. The Finance team, the Product team, and the Marketing team have different jobs, different incentive structures, and different analytical needs. They will, quite rationally, define the same word differently.

The wrong response to this is to force alignment, to declare that there is One True Revenue and everyone must use it. This fails because the different definitions serve different legitimate purposes. GAAP revenue exists because regulators and investors need a standardized accounting treatment. Product revenue exists because the engineering team needs to measure transaction system health. Marketing-attributed revenue exists because the marketing team needs to evaluate channel performance.

The right response is to make the conflict explicit, name the variants, and govern the relationships between them.

Here is a concrete example. A mid-stage SaaS company discovers that its "revenue" metric has three active definitions in production dashboards:

Table 2: Three Legitimate Revenue Definitions in One Organization

Metric Name	Owner	Definition	Typical Monthly Value	Use Case
revenue_gaap_recognized	Finance	ASC 606 recognized revenue, net of refunds, adjusted for deferred revenue	$12.4M	Board reporting, financial statements, investor relations
revenue_gross_transactions	Product	Sum of all completed transaction amounts, gross of refunds, point-of-sale timing	$14.1M	Product health monitoring, transaction system KPIs, real-time dashboards
revenue_attributed_marketing	Marketing	Revenue from customers who touched a marketing channel in the attribution window, multi-touch weighted	$16.8M	Campaign ROI, channel optimization, budget allocation

The gap between $12.4M and$ 16.8M is not an error. It is the natural result of three different teams measuring three different things and calling all of them "revenue." The ontological solution is:

Name each variant explicitly. No metric is called simply "revenue." Every metric has a qualified name that specifies its variant.
Document the differences. Each metric definition includes a prose explanation of what it includes, what it excludes, and why.
Map the relationships. The ontology should specify that revenue_gross_transactions minus refunds minus deferred revenue adjustments approximately equals revenue_gaap_recognized, and that revenue_attributed_marketing will always be higher than both because of multi-touch attribution overlap.
Assign ownership. Each metric variant has a designated owner, the team responsible for its definition, not its computation.
Version the definitions. When a definition changes (as it will when accounting standards update, or when the attribution model changes), the change is versioned and documented.

This approach does not eliminate disagreement. It channels disagreement into a structured process. When the CFO and the VP of Marketing argue about revenue, they are no longer arguing about whose number is right. They are arguing about which metric variant is appropriate for a specific decision context, a much more productive conversation.

The Metric Governance Framework

Metric governance is the organizational process by which metric definitions are proposed, reviewed, approved, published, and maintained. Without governance, a metric ontology degrades into the same chaos it was designed to prevent, just with YAML files instead of ad hoc queries.

A functional governance framework has five components:

1. Metric Proposal Process. Any team can propose a new metric or a modification to an existing one. The proposal must specify the four components (measure, dimensions, filters, time grain), the business justification, and the relationship to existing metrics. This is a pull request, not a committee meeting.

2. Review and Approval. A cross-functional metrics council, typically including representatives from data engineering, finance, product, and analytics, reviews proposals. The review is not about whether the metric is "correct" (multiple definitions can be correct). It is about whether the metric is well-specified, non-redundant, and appropriately named.

3. Publication and Documentation. Approved metrics are published to the metric registry, the canonical catalog that all tools and teams reference. Each metric entry includes its formal definition, its owner, its lineage (which source tables it depends on), and its relationships to other metrics.

4. Usage Monitoring. The governance framework tracks which metrics are actually being used, by whom, and in what contexts. Metrics that are defined but never queried are candidates for deprecation. Metrics that are heavily used but frequently questioned are candidates for improved documentation or redefinition.

5. Lifecycle Management. Metrics are not permanent. Business models change. Accounting standards evolve. Products are retired. The governance framework includes a deprecation process, a way to retire metrics gracefully, with notice periods and migration paths for downstream consumers.

Figure 3: Metric Definition Count Over Time, Governed vs. Ungoverned Organizations

The chart above illustrates a pattern observed across organizations that have implemented metric governance versus those that have not. In ungoverned environments, the number of distinct metric definitions grows exponentially, every new dashboard, every new analyst, every new business question produces new ad hoc definitions. Within eighteen months, the organization has hundreds of metric definitions, many of them redundant, many of them subtly inconsistent, and no one can tell which ones are authoritative.

In governed environments, the growth is linear and controlled. New metrics are added deliberately, redundancies are caught during review, and the total count reflects the actual complexity of the business rather than the entropy of the analytics process.

The governance overhead is real, the metrics council must meet, proposals must be written and reviewed, documentation must be maintained. But the alternative, the ungoverned state, imposes a much higher cost in reconciliation time, decision-making confusion, and institutional distrust of data.

Composable Metrics and Derived Measures

One of the most powerful properties of a well-designed metric ontology is composability, the ability to define new metrics as combinations of existing ones, with the ontology engine handling the computational details.

A base metric is defined directly against source data. Revenue, order count, active users, session count. These are the atomic units of the ontology.

A derived metric is defined as a mathematical operation on base metrics. Average order value is revenue divided by order count. Conversion rate is purchases divided by sessions. Customer acquisition cost is marketing spend divided by new customers acquired.

A composite metric combines metrics across different time windows or entity groups. Net revenue retention is a composite that requires current-period revenue from a prior-period cohort, divided by prior-period revenue from that same cohort. This involves time-shifted joins that are notoriously easy to get wrong in ad hoc SQL.

The power of composability is that derived and composite metrics inherit the definitions of their components. If revenue_gaap_recognized changes its filter to exclude a new category of refunds, every derived metric that uses it, gross margin, average revenue per user, revenue growth rate, automatically reflects the change. There is no manual propagation. The ontology handles it.

Here is how a MetricFlow YAML definition captures a derived metric with full semantic context:

# semantic_models/revenue.yml
semantic_models:
  - name: orders
    defaults:
      agg_time_dimension: order_date
    model: ref('fct_orders')
    entities:
      - name: order_id
        type: primary
      - name: customer_id
        type: foreign
    dimensions:
      - name: order_date
        type: time
        type_params:
          time_granularity: day
      - name: customer_segment
        type: categorical
    measures:
      - name: order_total
        agg: sum
        expr: amount_usd
      - name: order_count
        agg: count
        expr: order_id
 
metrics:
  - name: revenue_net
    description: "Net revenue after refunds, as reported to the board."
    type: derived
    type_params:
      expr: order_total - refund_total
      metrics:
        - name: order_total
        - name: refund_total
  - name: average_order_value
    description: "Average revenue per completed order."
    type: derived
    type_params:
      expr: order_total / order_count
      metrics:
        - name: order_total
        - name: order_count

When a business user queries the semantic layer, they interact with named metrics rather than raw SQL:

-- Querying the semantic layer via dbt Semantic Layer API
-- The user selects a metric, dimensions, and filters;
-- MetricFlow compiles it to optimized warehouse SQL.
 
-- What the analyst sees:
SELECT
  metric('revenue_net'),
  metric('average_order_value'),
  dimension('customer_segment'),
  dimension('order_date', grain='month')
WHERE
  dimension('order_date') >= '2025-01-01'
GROUP BY
  dimension('customer_segment'),
  dimension('order_date', grain='month')
 
-- What MetricFlow generates (simplified):
SELECT
  customer_segment,
  DATE_TRUNC('month', order_date)      AS order_date__month,
  SUM(amount_usd) - SUM(refund_usd)   AS revenue_net,
  SUM(amount_usd) / COUNT(order_id)   AS average_order_value
FROM analytics.fct_orders
LEFT JOIN analytics.fct_refunds USING (order_id)
WHERE order_date >= '2025-01-01'
GROUP BY 1, 2

MetricFlow implements this through a dependency graph. Each metric declares its dependencies, and the engine resolves the graph at query time, generating a single optimized SQL query that computes the metric from its base components. Cube achieves a similar result through its measure composition syntax.

The alternative, defining each metric independently in SQL, without a dependency graph, is how most organizations operate today. It means that when a base metric definition changes, every downstream metric must be manually identified and updated. This is the data engineering equivalent of copy-paste programming, and it produces the same category of bugs: subtle inconsistencies that propagate silently through the analytical layer.

Metric SLAs and Data Quality Contracts

A metric definition, no matter how precise, is worthless if the underlying data is unreliable. Metric SLAs, service level agreements on metric availability and quality, are the mechanism by which a metric ontology enforces accountability for the data that feeds it.

A metric SLA specifies:

Freshness. How recent must the data be? Some metrics (real-time GMV for an e-commerce platform) need data that is minutes old. Others (monthly GAAP revenue) are computed once and are acceptable at T+5 business days. The SLA makes the expectation explicit.

Completeness. What percentage of expected records must be present for the metric to be considered valid? If the payment processor has a 2-hour outage and 3% of transactions are missing from the daily load, is the revenue metric still publishable? The SLA defines the threshold.

Accuracy. What is the acceptable tolerance for known measurement errors? If click-tracking has a documented 2% under-count due to ad blockers, the SLA documents this and defines it as within tolerance. If the under-count suddenly jumps to 15%, the SLA triggers an alert.

Availability. When is the metric guaranteed to be queryable? A metric that powers a real-time executive dashboard has different availability requirements than one used in a monthly board deck.

Data quality contracts formalize these SLAs between the teams that produce data and the teams that consume it. The contract specifies: what the producing team promises to deliver (in terms of freshness, completeness, and accuracy), what happens when the promise is broken (alerts, incident process, fallback values), and how disputes are resolved.

This is not bureaucracy for its own sake. Without contracts, the default state is that nobody is responsible for data quality and everybody is surprised when it degrades. The metric shows a 40% drop in revenue, and the organization spends four hours determining whether this is a real business event or a pipeline failure. Contracts and SLAs reduce this ambiguity by establishing baselines and triggering alerts before the bad data reaches a dashboard.

Tools like Great Expectations, Soda, Monte Carlo, and Elementary have emerged specifically to automate data quality monitoring against SLA definitions. The integration of these tools with the semantic layer is still immature, most organizations run quality checks on raw tables rather than on semantic metric definitions, but the architectural direction is clear: quality checks should be defined at the metric level, not the table level, because the metric is the unit of meaning.

Self-Serve Analytics That Actually Works

Self-serve analytics is one of the most frequently attempted and most frequently failed initiatives in modern data organizations. The pitch is appealing: give business users the tools to answer their own questions, reduce the backlog on the data team, accelerate decision-making. The reality, in most implementations, is that business users are given a BI tool login and a warehouse full of tables they do not understand, and the result is a proliferation of incorrect dashboards that the data team must then audit and correct.

The failure is structural, not motivational. Business users are not failing because they are incompetent or unmotivated. They are failing because the system they are given requires them to make dozens of implicit ontological decisions, which table to use, which column represents the metric they want, which filters to apply, which join logic to follow, and they have no training, no documentation, and no guardrails for making those decisions correctly.

A metric ontology is the missing infrastructure that makes self-serve analytics viable. When business users interact with a semantic layer rather than raw tables, the ontological decisions are pre-made and governed. The user does not need to know that revenue lives in the fact_orders table, that it must be filtered by status = 'completed', and that refunds must be netted from a separate table via a left join. The user selects the metric revenue_net from a catalog, chooses their dimensions, and the semantic layer generates the correct SQL.

Figure 4: Self-Serve Analytics Outcomes, Raw Table Access vs. Semantic Layer (Score 0-100)

The data in the chart above is drawn from organizational assessments comparing self-serve programs that provide direct warehouse access against those that route users through a governed semantic layer. The differences are categorical, not incremental. Query accuracy, the percentage of self-serve queries that produce results matching a data-team-validated baseline, jumps from roughly 34% to 89% when a semantic layer is in place. Dashboard consistency, the percentage of dashboards showing the same number for the same metric, moves from 25% to 85%.

But the metric ontology alone is not sufficient. Self-serve analytics also requires:

A curated metric catalog. Not a data dictionary (a list of table columns), but a business-oriented catalog that describes metrics in terms business users understand. The catalog entry for revenue_net should say "Net revenue after refunds, as reported to the board, updated daily by 8am ET", not "SUM(amount) from fact_orders WHERE status = 'completed' LEFT JOIN refunds."

Guardrails against misuse. Not every metric-dimension combination is valid. Slicing daily active users by invoice amount is nonsensical. The semantic layer should prevent invalid combinations, not just allow them and hope the user notices the result makes no sense.

Progressive disclosure of complexity. A marketing manager does not need to see every metric in the ontology. They need to see the metrics relevant to their domain, with the option to explore further if needed. This is a UX problem, and most semantic layer implementations ignore it entirely.

Feedback loops. When a business user queries a metric and gets a result that surprises them, there must be a mechanism to flag the result for review. This feedback loop is how the ontology improves over time, each question about a metric is an opportunity to improve its documentation or identify a gap in the taxonomy.

The Organizational Change Management Challenge

Here is the uncomfortable truth about metric ontology design: the technical implementation is the easy part. Defining metrics in YAML, deploying a semantic layer, connecting BI tools to it, these are engineering tasks with known solutions. The hard part is getting an organization of humans to agree on definitions, submit to governance, and change the way they have been working for years.

The resistance is predictable and comes from several directions.

Autonomy resistance. Teams that have built their own dashboards and defined their own metrics will resist centralizing those definitions. This is not mere stubbornness, they have legitimate concerns that a centralized process will be slow, will not accommodate their specific needs, and will be governed by people who do not understand their domain.

Ownership conflicts. When the metric ontology requires that every metric have a single owner, teams will fight over who owns contested metrics. "Revenue" is the obvious example, but the conflicts extend to metrics like "active users" (does Product own it? Growth? Marketing?), "churn" (is it Customer Success? Finance? Product?), and "cost per acquisition" (Marketing? Finance? Growth?).

Speed concerns. In an ungoverned environment, an analyst can create a new metric in minutes by writing a SQL query. In a governed environment, they must submit a proposal, wait for review, and follow a naming convention. The perceived loss of speed is a major adoption barrier, even when the ungoverned approach produces inconsistent results.

Cultural inertia. Many organizations have a culture of data as tribal knowledge, specific analysts or teams are known as the "owner" of certain numbers, and their expertise is their job security. A metric ontology makes that knowledge explicit and shared, which can feel threatening.

The change management approach that works, observed across organizations that have successfully implemented metric governance, has several characteristics:

Start with the metrics that cause the most pain. Do not attempt to ontologize every metric at once. Start with the three to five metrics that generate the most internal conflict and reconciliation time. Revenue is almost always one of them. Show that the ontology resolves the conflict, and use that success to build momentum.

Co-create with the domain teams. The data team should not define business metrics in isolation. Each metric definition should be co-authored with the team that owns the business process it measures. The Finance team must co-own the GAAP revenue definition. The Product team must co-own the active users definition. Co-creation builds buy-in and produces better definitions.

Make the governed path faster, not just more correct. If the governed path is slower than the ungoverned path, adoption will stall. The semantic layer must make it genuinely easier to get a correct answer than to write ad hoc SQL. This means investing in the developer experience, fast query compilation, clear error messages, good documentation, and integration with the tools people already use.

Accept that the process will be messy. The first version of the metric ontology will be incomplete. Definitions will be debated. Edge cases will surface. The governance process will feel heavy at first. This is normal. The goal is not perfection at launch, it is a credible system that improves over time.

The Metric Ontology Maturity Model

Based on patterns observed across organizations at different stages of metric governance adoption, a five-level maturity model captures the typical progression:

Level 1: Ad Hoc. No formal metric definitions. Each analyst writes their own SQL. Metric definitions live in individual heads and Slack threads. Conflicts are resolved by whoever is most senior or most persistent. This is the default state of most organizations under 200 people.

Level 2: Documented. Metric definitions are written down in a wiki, spreadsheet, or data dictionary. This is a meaningful improvement over Level 1, but the documentation is disconnected from the actual computation, the wiki says one thing, the SQL says another, and nobody notices until a board meeting.

Level 3: Codified. Metric definitions are implemented in code, in a semantic layer, a dbt metrics file, or a Cube schema. The definition and the computation are the same artifact. This eliminates the documentation-computation gap but does not address governance or organizational adoption.

Level 4: Governed. Metric definitions are codified, and a governance process controls how they are created, modified, and deprecated. A metrics council reviews changes. Ownership is assigned. SLAs are defined. The metric catalog is the authoritative reference for the organization.

Level 5: Self-Serve. The governed metric ontology is exposed to business users through a curated catalog and a semantic layer that prevents misuse. Business users can answer the majority of their analytical questions without data team intervention. The data team shifts from answering ad hoc requests to maintaining and extending the ontology.

Table 3: Metric Ontology Maturity Model

Maturity Level	Metric Definitions	Governance	Self-Serve Capability	Typical Org Size
Level 1: Ad Hoc	In analysts heads and SQL queries	None	None, all requests go through data team	Seed to Series A
Level 2: Documented	Written in wiki or spreadsheet	Informal, wiki-based	Minimal, users read docs, still need help	Series A to B
Level 3: Codified	Implemented in semantic layer code	Code review on PRs	Partial, technical users can query the layer	Series B to C
Level 4: Governed	Codified with formal review process	Metrics council, ownership, SLAs	Moderate, governed catalog available to power users	Series C to pre-IPO
Level 5: Self-Serve	Codified, governed, and catalog-exposed	Full lifecycle governance with feedback loops	High, majority of questions answered without data team	Post-IPO / Enterprise

Most organizations that attempt self-serve analytics try to jump from Level 1 directly to Level 5. This fails because each level builds infrastructure and organizational muscle that the next level depends on. You cannot govern what you have not codified. You cannot expose to self-serve what you have not governed. The progression is sequential, and skipping levels produces fragile systems that collapse under the first wave of real usage.

The typical timeline from Level 1 to Level 4 is twelve to eighteen months for an organization that commits to the effort. Level 5 takes an additional six to twelve months and requires sustained investment in the catalog experience and organizational change management.

References

Gruber, T.R. (1993). "A Translation Approach to Portable Ontology Specifications." Knowledge Acquisition, 5(2), 199-220.
Kimball, R. and Ross, M. (2013). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. 3rd Edition, Wiley.
dbt Labs (2023). "The dbt Semantic Layer and MetricFlow." dbt Documentation. https://docs.getdbt.com/docs/build/about-metricflow
Cube Dev (2023). "Cube Documentation: Semantic Layer." https://cube.dev/docs
Atlan (2023). "State of Data Culture Report: Metric Definition Challenges." Atlan Research.
Fowler, M. (2003). "Organizing Domain Logic." Chapter in Patterns of Enterprise Application Architecture. Addison-Wesley.
Guarino, N. (1998). "Formal Ontology in Information Systems." Proceedings of FOIS'98, IOS Press, 3-15.
Stonebraker, M. and Cetintemel, U. (2005). "One Size Fits All: An Idea Whose Time Has Come and Gone." Proceedings of the 21st International Conference on Data Engineering.
Zhu, X., Song, S., Wang, J., and Yu, P.S. (2019). "Data Quality: Theory, Algorithms, and Management." IEEE Data Engineering Bulletin, 42(2), 3-16.
Patil, D.J. and Mason, H. (2015). Data Driven: Creating a Data Culture. O'Reilly Media.
Ereth, J. (2018). "DataOps: Towards a Definition." LWDA Conference Proceedings, 104-112.
Ehrlinger, L. and Woess, W. (2016). "Towards a Definition of Knowledge Graphs." SEMANTICS Conference, Joint Proceedings of the Posters and Demos Track.

The Conversation

4 replies

Thomas Raines2y ago

The governance problem is upstream of the semantic layer, not downstream. You can deploy MetricFlow, Cube, LookML, whatever, but if 'active user' has three definitions scattered across finance, growth, and product, the semantic layer just crystallizes the disagreement into code. We ended up doing a 6-week metric definition review with all stakeholders in one room before we shipped. The tooling was the easy part.

Selin Kabadayı2y ago

strongly agree on the 'ontology before dashboards' order. we did it the other way for 3 years and burned a full quarter consolidating. one thing the post understates, version control for metric definitions is a real requirement. when 'churn' changes from 30-day to 35-day window every downstream dashboard needs to know. we use a git-backed metric registry now and CI fails if the change isnt propagated

Rahul Khanna2y ago

respectfully disagree on the idea that self-serve is the end goal. in our experience the 20% of people who actually need deep analytics want SQL access, and the other 80% want curated dashboards a human built for them. self-serve as aspiration leads to a FRANKENSTEIN middle state where nobody is happy. better to design two tracks.

Isabella Romano2y ago

The five-definitions-of-revenue problem is painfully real. At our place it was literally: booked revenue, recognized revenue, net-of-refunds revenue, FX-normalized revenue, and cohort-assigned revenue. All legitimate, all different numbers, all reported as 'revenue' in different meetings. The fix for us was forcing explicit suffixes in the metric layer, you literally can't write 'revenue', you must write one of the five.

Join the conversation

Disagree, share a counter-example from your own work, or point at research that changes the picture. Comments are moderated, no account required.