Cohort Analysis at the Action-Set Level (Not User-Level)

TL;DR: The default cohort in most analytics tools is the sign-up month, a slice that confuses when a user arrived with what they did once they were there. Action-set cohorts (users who completed a defined sequence of events inside a defined window) predict retention earlier, separate product-fit signal from acquisition noise, and turn a retention curve from a lagging metric into a leading one. The cost is real: an event taxonomy that survives a year, materialized cohort views that refresh on a schedule the rest of the warehouse can rely on, and a de-duplication discipline that prevents the same user from leaking into the same cohort more than once. This essay maps the conceptual shift, the infrastructure required to support it, and the failure modes we have observed in advisory work.

A note on the examples. Amplitude, Mixpanel, and Heap appear as well-known product-analytics archetypes, not as data sources. The retention curves and cohort-membership figures in this essay are composites built from advisory engagements with anonymized partner operators in SaaS, marketplace, and consumer-app archetypes. Where a specific vendor feature is referenced, it is sourced to public documentation.

The Default Cohort Is a Structural Artifact

Open any analytics dashboard and the default cohort is some flavor of sign-up month. Users who signed up in January 2025 form one row. Users who signed up in February form the next. The retention curve plots how many of each cohort were still active in week one, week two, week four, week twelve. The chart is familiar enough that most teams have stopped questioning what it actually measures.

What it measures is not what most operators think. The sign-up-month cohort is a structural fact about acquisition: it tells you who came through the door in a particular month. It is silent on what those people did once they were inside. A January cohort that contains a spike of paid-social acquisitions, a small organic tail, and a handful of referrals from a podcast episode is treated as a single unit by the default cohort grid. The downstream retention number is an average over three or four very different sub-populations, and the average is dominated by whichever sub-population is largest, not by whichever one is most economically interesting.

The honest version of cohort analysis takes the unit of analysis somewhere else. The unit is not "users who arrived in month X." The unit is "users who completed action set A inside window W." Action sets are behavioral: they describe what a user did, in what order, within how much elapsed time. The shift sounds small. In practice it changes the entire predictive content of the cohort table.

The first time a team replaces sign-up-month with a meaningful action-set cohort, two things tend to happen. The retention curves become more variable across cohorts: the differences that were averaged away inside the calendar slice surface as a wider spread. And the early-life retention numbers start moving in advance of the late-life retention numbers, by anywhere from two to ten weeks depending on the product. That second pattern is what the operator wants. It is the difference between a retention metric that confirms what already happened and a retention metric that warns about what is about to.

What the Literature Calls This (and What It Does Not)

The phrase "behavioral cohort" is the product-analytics vendor framing. The academic literature has more careful language. The classical cohort framework in epidemiology and demography defines a cohort as "a group of individuals who experience the same event during the same time period," where the defining event can be birth (a birth cohort), enrollment in a study, or exposure to a treatment. The product-analytics version inherits that frame but is sloppy about what the defining event actually is, partly because event taxonomies in software products are themselves sloppy.

Frederick Reichheld's Zero Defections: Quality Comes to Services (HBR, September-October 1990) and the broader Reichheld loyalty research established the operating logic for retention as the primary lever of long-term profitability. Reichheld's empirical finding, that a five-percentage-point reduction in defection rate could produce 25 to 95 percent higher profits depending on the industry, made the retention curve a board-level number. But Reichheld's framework treated the customer as a unit of analysis without much specification of what the customer had to have done to count as retained. The action-set framing is the operating refinement: a "retained" user is one who completed the defining behavior in the most recent period, not one who merely held an account.

The Heskett et al. service-profit chain (HBR, March-April 1994 original; the 2008 republication is more commonly cited) connected customer loyalty to employee satisfaction and internal service quality. The chain is causal in spirit but loose in measurement; the action-set cohort gives the chain a measurable left-hand side.

The product-analytics community's contribution is operational. Amplitude's documentation on behavioral cohorts defines them as "groups of users defined by the actions they take," with the explicit framing that cohort membership is computed from event streams rather than declared at sign-up. Mixpanel's similar feature, and Heap's automatic event capture model, describe the same shape.

What the literature is light on is the infrastructure required to compute action-set cohorts at the cadence the operating teams want. The academic frame defines the cohort and discusses what it should predict. The vendor documentation describes a button to click. Neither addresses the materialized views, de-duplication rules, and event-taxonomy stability that determine whether the cohorts you compute on Tuesday match the cohorts you compute on Friday. That is the part of this essay we want to spend the most time on.

The Shift From "Who Arrived" to "Who Did the Thing"

The conceptual shift is best illustrated by a side-by-side. Consider a SaaS product with three plausible cohort definitions for the same population of new sign-ups in a quarter.

Cohort A is the calendar cohort: every user who signed up in Q3 2025. It contains, in our composite example, 12,400 users.

Cohort B is the activation cohort: users who completed the product's onboarding sequence (workspace created, first integration connected, first export run) within seven days of sign-up. In the same population, 4,180 users qualify. About a third of the sign-up cohort.

Cohort C is the deeper action-set cohort: users who completed Cohort B's events AND added at least one collaborator AND ran the product on at least two distinct days in the first fourteen days. In the same population, 1,920 users qualify. About 15 percent of the original sign-up cohort, and about 46 percent of Cohort B.

Three Cohort Definitions on the Same Q3 2025 Sign-Up Population (Advisory Partner Composite, n = 12,386)

Cohort	Definition	Members	Week 12 Retention	Week 26 Retention	Predictive of LTV?
Cohort A: sign-up month	All users who signed up in Q3 2025	12,386	26.7%	17.6%	Weakly. Average obscures sub-populations
Cohort B: activation	Onboarding sequence done in 7 days	4,178	61.2%	46.8%	Moderately. Better than A but still mixed
Cohort C: deeper action set	B + collaborator + 2 distinct active days	1,924	83.8%	71.2%	Strongly. Tightly correlated with month 12 ARPU

The week-twelve retention spread across the three cohorts (27 percent, 61 percent, 84 percent) is the relevant operating signal. A team looking at Cohort A's 27 percent number sees a generic retention curve that may or may not be a problem. The same team looking at Cohort C's 84 percent number sees the actual product-fit signal: among users who behaved like serious users in the first two weeks, the product retains them at industrial rates. The follow-on question is no longer "how do we improve retention" (too broad) but "how do we get more of Cohort A into Cohort C" (a specific funnel problem with measurable steps).

The shift is what Amplitude's product team has been arguing for since the Definitive Guide to Behavioral Cohorting was first published. The vendor framing is correct, even if the documentation is light on the harder infrastructure questions.

The Predictive Argument: Why Action-Sets Lead the Curve

A retention curve built on calendar cohorts moves slowly. By the time the curve has bent (twelve weeks, sixteen weeks, half a year), the underlying behavior has been settled for a while. The operating decision based on a calendar retention curve is always retrospective: the curve confirms that something happened, and the team picks up the pieces.

The predictive argument for action-set cohorts is that the membership of the cohort is observable inside the first one to two weeks of a user's life, and the cohort's downstream retention is tightly correlated with that membership. The week-twelve retention of users who completed the deeper action set within fourteen days is observable at week fourteen, not at week twelve. The leading-indicator gap is two weeks in the partner data we have observed, and in many products is longer.

Composite Retention Curves by Cohort Definition, Weeks 1 to 26 (Advisory Partner Composite, SaaS Archetype, 2023-2024)

The spread between the three curves is the operating signal. The flatter curve at the top of the chart (the deeper action-set cohort) is the population for whom the product is working. The steeper curve at the bottom is the population for whom it is not. The operating decisions that follow from this chart are about funnel improvements that move users from the bottom curve toward the top, not about generic retention improvements that try to bend all three curves at once.

The reason this is a leading rather than a lagging signal is that the cohort membership decision (did the user complete the action set within the window) is made within fourteen days. The downstream retention of those users is mechanically determined by that early-life behavior in a way that is robust across the cohorts we have observed in advisory work. The strength of the relationship varies by product, but the directional pattern is consistent.

What Counts as an Action Set (and What Does Not)

The cohort is only as useful as the action set that defines it. Three practitioner habits separate action-set definitions that survive a year from those that fall apart in three months.

Habit one: the action set should be a hypothesis about value realization, not a list of clickable surfaces. A bad action set is "logged in three times, opened the settings page, viewed the pricing page." Those events are clickable but not diagnostic; they could be anyone wandering around. A good action set is "created the unit of work the product exists to produce, shared it with a second party, and returned within seven days." The good action set is shorter, sharper, and grounded in what the product is for.

Habit two: the action set should have a defined time window, and the window should be defensible against arbitrary tightening. A common failure mode is to gerrymander the window after the fact: the team observes that fourteen days produces a clean cohort and quietly switches from seven days. The cohort retention now looks better because the window changed, not because the product did. The window should be set at the time the cohort is defined and changed only when the underlying product flow changes, not when the metric needs to be moved.

Habit three: the action set should be stable under event-taxonomy changes. This is the operational habit most often violated. The team adds a new onboarding step and quietly redefines the cohort to include it. Now the historical comparison is broken: users from six months ago could not have completed the new step, so their cohort membership is forced. The honest move is to version the cohort: cohort definition v1 was the original, cohort definition v2 includes the new step, and the dashboards show both side by side for the transition period until v1 is sunset.

Good and Bad Action-Set Definitions, Practitioner Examples

Product Type	Bad Action Set (Clickable Surfaces)	Better Action Set (Value Realization)	Time Window
B2B SaaS, project management	Logged in 3x, opened settings	Project created, second collaborator invited, second login on different day	14 days
Consumer photo app	Viewed feed twice	Photo edited, photo shared externally, returned next week	10 days
Marketplace	Browsed 5 listings	Listing viewed, message sent to seller, second session	7 days
Developer tool / API	Created API key, viewed docs	Successful API call to a non-test endpoint, integration tagged production, second call >24h later	21 days
Subscription media	Played a video	Watched >50% of a piece of content, returned within 4 days, watched a second piece	7 days

The cleanest internal test for an action-set definition is whether the team can articulate why each event in the set is in there, in plain language, without resorting to "because the dashboard looks better when we include it." Events that fail that test are decoration. Cohorts built from decorative event lists are not predictive; they are post-hoc rationalizations dressed as analytics.

The Infrastructure Layer: Event Taxonomy, Materialization, De-Duplication

The conceptual case for action-set cohorts is straightforward. The infrastructure case is where teams get stuck.

Event taxonomy. The cohort is a predicate over events. The events are emitted by the product. If the event stream is inconsistent (the same logical action firing under three different event names depending on whether the user is on web, iOS, or Android, with different payload shapes), the cohort predicate degrades silently. The first investment is taxonomy hygiene: a documented event catalog, owned by an analytics-engineering function, with a single source of truth that the product team is required to update when a new feature ships. Without this, the cohort is built on sand.

The practitioner convention we have seen work best is a three-part event name (object.action.qualifier, for example project.created.from_template), a documented payload schema for each event, and a deprecation policy that retains old event names for at least one full retention window after a rename. Anything less and the cohort predicates that worked in March will not work in September, not because the product changed but because the events did.

Materialization. The naive way to compute a behavioral cohort is to run a SQL query over the raw event table every time a dashboard loads. This is fine for a small product. It does not scale. The mature pattern is to materialize cohort membership into a dedicated table, refreshed on a defined schedule (hourly, daily) by a deterministic job, with the materialization logic version-controlled in dbt or an equivalent transformation framework. Every dashboard, every Slack alert, every machine-learning feature that depends on cohort membership reads from the materialized table, not from the raw events. This is the discipline that turns cohorts from one-off SQL into a reliable financial system.

The action-set cohort materialization pipeline

Loading diagram...

De-duplication. The least glamorous and most consequential part of the infrastructure. The same user often appears in the event stream under multiple identifiers: an anonymous device ID before sign-up, a user ID after, a different anonymous ID on a second device, sometimes a separate identifier after a merge. If the cohort predicate counts events without resolving identities, the same user can be inserted into the cohort more than once, or counted as completing the action set when the events were emitted by different sessions of the same user.

The de-duplication problem is technically a graph problem (the identity graph), and it interacts with the cohort predicate in subtle ways. A user who completed the first event under anonymous ID X and the second event under user ID Y will satisfy the cohort predicate if and only if X and Y are resolved to the same identity. If the identity resolution is incomplete, the cohort under-counts; if it is too aggressive (collapsing two genuinely different users into one), the cohort over-counts.

Retention Curves Built On Action Sets: Reading Them Honestly

A retention curve built on calendar cohorts is read by comparing the week-N retention of one cohort to the week-N retention of the next, looking for trend. A retention curve built on action-set cohorts requires a different reading discipline.

First, the curve is conditional on cohort membership. The week-twenty-six retention of the deeper action-set cohort (the 71 percent figure in our composite) is not the product's retention. It is the retention of users who completed the deeper action set within fourteen days. The product's retention, integrated over the full sign-up population, is the calendar number (18 percent in the same example). Both numbers are correct and both matter, but they answer different questions.

Second, the spread between the calendar curve and the action-set curve is itself a metric. The "activation gap" (the difference between the calendar cohort's retention and the activation cohort's retention) measures how much the product depends on early behavior to retain users. A product where the gap is small (calendar 18 percent, activation 28 percent) is one where the early behavior does not predict much; the product is either uniformly good or uniformly indifferent. A product where the gap is large (calendar 18 percent, activation 61 percent) is one where the early funnel is everything, and the operating leverage is in moving users across the activation threshold.

The Activation Gap: Week 26 Retention by Cohort, Across Five Advisory Partner Products (2023-2024)

Third, the action-set cohort's curve should be read for shape, not just for level. A flat curve from week four onward suggests that whatever the action set captures is sufficient: once a user is in, they stay in. A curve that continues to decline through weeks twelve, twenty-six, and beyond suggests that the action set is necessary but not sufficient; there is a second behavior, captured by a deeper action set, that is the actual loyalty mechanism. The diagnostic move is to define progressively deeper action sets and see which one produces the flat curve. The cohort that flattens is the cohort that captures the value-realization moment.

In advisory work, the most reliable diagnostic for product-market fit is not survey-based or NPS-based. It is the shape of the retention curve for the deepest defensible action-set cohort. A product where that curve is genuinely flat by week eight has product-market fit on the dimension the action set captures. A product where that curve is still bending downward at week twenty-six does not, and no amount of marketing investment will fix it; the fix has to come from inside the product.

The Operating Cadence: How Often to Recompute, How Often to Redefine

There are two different refresh cadences in an action-set cohort system, and conflating them is one of the easier ways to break the trust the team places in the numbers.

The first is recomputation cadence: how often the materialized cohort view is refreshed. This is mechanical and should be on a fixed schedule, typically daily for most cohorts and hourly for cohorts feeding real-time use cases (an in-product nudge, a sales alert, a churn intervention). The recomputation cadence is determined by the freshness requirements of the downstream consumer, not by the convenience of the analytics team.

The second is definition cadence: how often the action set itself (the events that qualify a user) is revised. This should be much slower. A cohort definition should be stable for at least one full retention window (so that the historical comparison remains meaningful), and ideally for a year or longer. When a definition changes, the old definition continues to be materialized in parallel for a transition period (we have used six months as a default), so that the dashboards do not snap from one set of numbers to another overnight.

The temptation, when the team is unhappy with a cohort's numbers, is to redefine the cohort. This is almost always the wrong move. The cohort is a measurement instrument; redefining it on the fly is equivalent to changing the units on the scale because the weight does not look right. The right move is to keep the definition and investigate the underlying behavior. If the cohort is genuinely getting smaller (fewer users qualifying), the question is why the product flow is filtering more users out. If the cohort is qualifying but not retaining, the question is what changed downstream of the action set.

Recomputation vs Redefinition Cadences, Practitioner Defaults

Operation	Typical Cadence	Owner	Failure Mode If Wrong Cadence
Cohort recomputation (materialized refresh)	Daily for most; hourly for real-time triggers	Analytics engineering	Stale numbers; downstream alerts fire on yesterdays world
Cohort definition revision	≥ once per retention window, max once per year; old version in parallel for 6 months	Analytics + product, with sign-off	Historical comparison broken; metric appears to move when only the ruler moved
Event taxonomy additions	On product feature ship, with documentation	Analytics engineering + product engineering	Predicates drift silently; cohort under or over-counts
Identity-graph re-resolution	Weekly, with backfill for affected cohorts	Data platform	De-duplication degrades; same user counted twice or not at all
Dashboards / Slack alerts	Read from materialized views, not raw events	Whoever ships the alert	Inconsistent numbers across surfaces; trust erodes

The practitioner shorthand: refresh often, redefine rarely. The first is operational hygiene. The second is, every time, an accidental rewrite of history.

Where Action-Set Cohorts Break Down

Two failure modes are worth naming, because they are not solved by better infrastructure.

Failure mode one: the action set captures a behavior that is correlated with retention but is not causally related to it. A classic example: users who set up two-factor authentication retain better than users who do not, so a cohort defined on 2FA setup shows a flat retention curve. But the 2FA setup did not cause the retention. The retention caused the 2FA setup: users who were already serious about the product invested the additional minute to secure their accounts. Building product nudges to push more users to set up 2FA will not move retention, because the cohort membership was a marker of the underlying intent rather than a driver of it. The diagnostic is whether the action-set predicate is something the user would do unprompted; if it requires significant friction that filters by intent rather than producing it, the cohort is descriptive but not actionable.

Failure mode two: the action set is gameable by product changes that move the cohort numbers without changing the underlying behavior. If the cohort qualifying event is "added a collaborator," and the product team adds a default "you have one collaborator already" placeholder in the onboarding flow, the cohort qualification rate spikes overnight. The numbers look like the team improved activation by 40 percent. They did not; they moved the qualifying line. The defense against this is a regular cohort-definition audit, run by an analytics team that did not ship the change, that flags qualification-rate spikes for behavioral causation review.

Both failure modes share a structure. The cohort is a measurement instrument, and instruments that get touched too easily by the people whose performance they measure lose their value as instruments. Goodhart's Law applies to cohort definitions in exactly the way it applies to any other operating metric: when a measure becomes a target, it ceases to be a good measure. The discipline that resists this is separation of concerns: the team that defines the cohort is not the team that owns the metric, and the cohort definitions are versioned, dated, and signed off.

Action-Set Cohorts and the LTV Stack

A retention curve is one consumer of the cohort. A lifetime-value model is the other. The LTV calculation depends on the survival function, the revenue per user per period, and the discount rate. The survival function is the cohort retention curve. Once the cohort is action-set rather than calendar, the LTV becomes conditional on action-set membership, and the operating implications change.

The calendar LTV (LTV for the average user who signed up in Q3 2025) is a blended number. It averages high-LTV users who completed the deeper action set with low-LTV users who never activated. The blended number is useful for payback-period and unit-economics calculations at the portfolio level, but it is misleading as a basis for acquisition-channel investment, because different channels deliver different mixes of action-set membership.

The action-set LTV (LTV conditional on completing the deeper action set within fourteen days) is a sharper number. It tells you what a "real" user is worth. The product's economic problem then decomposes cleanly into two questions: how many sign-ups become action-set members (the conversion problem) and what are those members worth (the LTV-on-members problem). Each question has a different owner and a different operating cadence. Conflating them, which is what calendar-LTV does, hides the fact that they need different interventions.

The cohort-based unit-economics framework we have written about elsewhere (cohort-based unit economics) extends cleanly to action-set cohorts. The substitution is mechanical: replace the calendar cohort index with the action-set cohort index, recompute the per-cohort P&L, and the patterns surface earlier. In a SaaS company we worked with in 2024, the action-set unit economics surfaced a deterioration in newer cohorts roughly twelve weeks before the calendar-cohort numbers showed it, because the action-set membership rate had been falling for two months in a way that the blended retention curve had absorbed.

A retention curve built on action sets does not just measure retention. It measures, week by week, whether the product is still doing for new users what it did for old users. The calendar curve cannot make that distinction.

A Twelve-Week Migration: What Changes In Order

For a team running on calendar-cohort retention today and considering the shift to action-set cohorts, the migration order matters. The pieces interact, and a partial migration is often worse than no migration at all.

Event-taxonomy audit and stabilization. Before any cohort work, audit the event stream. Document the canonical event names. Resolve duplicates and ambiguities. Establish a deprecation policy. Without this step, every downstream cohort is built on inconsistent inputs.
Identity graph hardening. Establish the resolved-identity table that joins anonymous IDs, user IDs, and any other identity surfaces. Document the resolution rules. Establish a re-resolution cadence. Backfill the resolved identity into the historical event stream. Cohorts built on unresolved IDs will produce numbers that do not survive scrutiny.
First action-set cohort: the activation cohort. Define the activation cohort (one tight action set, defensible time window) and materialize it. Run it in parallel with the calendar cohort for one full retention window. Compare the curves. The team should converge on what the activation cohort is telling them before adding more cohorts.
Cohort definition versioning and change-control process. Establish the discipline before the second cohort is added. Pull requests for cohort changes, parallel materialization of old and new definitions, dated changelogs.
Deeper action-set cohorts. Once activation is stable, define progressively deeper cohorts. Each one should be tied to a hypothesis about value realization, not a generic engagement proxy.
Migration of downstream consumers. Dashboards, alerts, ML features, retention reports. Each consumer is migrated from the calendar cohort to the action-set cohort one at a time, with both running in parallel during the transition.
Sunset of calendar-only views. Once consumers are migrated, retire the calendar-only views. Keep one calendar curve as a sanity check; do not let it disappear entirely, because it is the broadest population view.

The full migration takes one to two quarters for a mid-sized analytics organization, longer if the event taxonomy is in worse shape than the team initially believes (which it almost always is). The teams that move faster than this generally cut corners on identity resolution or definition versioning, and the cost surfaces six months later when the numbers stop reconciling.

Key Takeaways

The default cohort in most analytics tools is a sign-up calendar slice, which is a structural fact about acquisition, not a behavioral fact about the user. The retention curves it produces are averages over heterogeneous sub-populations and are weak as operating instruments.
Action-set cohorts (users who completed a defined sequence of events inside a defined window) replace the calendar slice with a behavioral predicate. The cohort membership is observable within one to two weeks and predicts downstream retention with materially higher fidelity.
The strength of the predictive relationship is not magic. It is mechanical: users who completed the action set are users who realized value early, and value realization in the first two weeks is the dominant input to long-term retention in most products we have observed.
The infrastructure cost is real. An action-set cohort requires a stable event taxonomy, a resolved identity graph, materialized cohort views on a defined refresh cadence, and a versioned cohort-definition process. None of this is technically hard, but all of it requires discipline that most organizations underinvest in.
The single most common failure mode is gerrymandering the cohort definition after the fact to improve the numbers. The discipline that prevents this is treating cohort definitions like database schemas: versioned, dated, signed off, with old versions kept in parallel for the transition period.
Recomputation cadence (daily or hourly, the materialized refresh) is operational. Definition cadence (when the action set itself is revised) is strategic and should be slow: at most once per retention window, with the old definition kept in parallel for six months.
The activation gap (the spread between the calendar cohort retention and the activation cohort retention) is itself a diagnostic metric. A large gap means the early funnel is the dominant lever; a small gap means the product needs work that no funnel optimization will fix.
Action-set cohorts plug cleanly into the broader LTV stack. The substitution is mechanical, and the resulting LTV-on-members number is a sharper basis for acquisition-channel investment than the blended calendar LTV.
Goodhart's Law applies. A cohort that the operating teams can change too easily ceases to measure anything stable. The organizational defense is separation of concerns: the team that defines the cohort is not the team whose performance the cohort measures.
Migration order matters. Taxonomy first, identity second, activation cohort third, change-control discipline fourth, deeper cohorts fifth. Teams that move faster than this discover within six months that their numbers no longer reconcile.