Jobs-to-Be-Done Segmentation Using NLP: Mining Customer Reviews to Discover Unmet Needs at Scale

TL;DR: Demographic segmentation fails for product innovation because three customers buying the same drill may have zero overlap in their reasons for purchase. NLP applied to millions of customer reviews can surface Jobs-to-Be-Done at a fraction of the cost of traditional qualitative research ( $100K-$ 300K per study), discovering jobs that interview subjects cannot articulate because the language patterns reveal the circumstance of purchase, not just the product preference.

Nobody Buys a Drill

A 34-year-old woman in Phoenix buys a $400 cordless drill from Home Depot. A 58-year-old retired contractor in Maine buys the same drill. A 26-year-old apartment renter in Brooklyn buys the same drill.

Your demographic segmentation sees three entirely different customers. Three different personas. Three different marketing strategies. Three different ad creatives.

Your demographic segmentation is wrong about all of them.

The woman in Phoenix is hanging shelves in a nursery before her first child arrives. She needs the job done before the due date. The retired contractor is building a deck, something he has done a hundred times but now does for pleasure instead of pay. The Brooklyn renter is assembling IKEA furniture with a friend and wants the fastest way through 47 cam lock screws.

Three customers. Same product. Three different jobs. Zero overlap in the reason for purchase.

This is the central insight of Clayton Christensen's Jobs-to-Be-Done theory, first articulated in The Innovator's Solution (2003) and later expanded in Competing Against Luck (2016): customers do not buy products. They hire products to make progress in a specific circumstance. The "job", the progress the customer is trying to make, is the true unit of market segmentation. Not age. Not income. Not psychographic cluster.

The problem, for 20 years, has been discovering those jobs at scale. Christensen's method relied on qualitative interviews, deep, time-consuming, expensive conversations with individual customers. A proper JTBD study costs $100,000 to $300,000 and takes 8 to 16 weeks to complete.

Natural language processing changes this equation. Millions of customers describe their jobs every day, unprompted, in product reviews. They tell you why they bought. What circumstance triggered the purchase. Whether the product did the job. And what the product failed to do.

They just don't use the word "job."

Why Demographic Segmentation Fails for Innovation

Demographic segmentation works for media buying. If you want to reach people who watch NFL games, buying ads during NFL games is a reasonable strategy. Age, gender, income, and geography predict media consumption well enough for placement decisions.

Demographic segmentation does not work for understanding why people buy things. The correlation between who someone is and what job they need done is weak and often misleading.

Christensen's milkshake study is the canonical example. McDonald's wanted to sell more milkshakes. They segmented customers demographically, ran focus groups, adjusted the product based on demographic preferences. Sales did not move. When Christensen's team observed actual purchasing behavior, they discovered two entirely distinct jobs. Morning commuters hired the milkshake to make a boring commute more interesting and fill their stomach until lunch. Afternoon parents hired the milkshake to give their child a treat without the sugar guilt of a candy bar.

Same product. Same store. Two different jobs. The morning commuters wanted the milkshake thicker and with chunks, to last longer on the drive. The afternoon parents wanted it smaller and thinner, so the child could finish it before their patience ran out.

No demographic variable predicted which job a customer was hiring the milkshake to do. Time of day did. Circumstance did. Demographics did not.

The failure extends beyond milkshakes. Ulwick (2005) studied 400 product development initiatives and found that teams using demographic segmentation achieved a 17% success rate for new products. Teams using job-based segmentation achieved a 86% success rate. The difference is not small. It is the difference between a viable innovation practice and an expensive lottery.

New Product Success Rate by Segmentation Method

The gap between needs-based segmentation (48%) and JTBD segmentation (86%) is worth examining. Needs-based segmentation asks what the customer wants. JTBD segmentation asks what the customer is trying to accomplish and under what circumstances. The difference sounds semantic. It is structural. A customer "needs" a faster laptop. A customer "is trying to" render video footage on deadline while traveling with unreliable Wi-Fi. The first statement gives you a spec. The second gives you a product.

The Job as the Unit of Analysis

A "job" in the JTBD framework has specific properties that distinguish it from a "need," a "want," or a "use case."

A job is functional, emotional, and social simultaneously. The functional job of a business suit is covering your body in professional attire. The emotional job is feeling confident walking into a high-stakes meeting. The social job is signaling competence and status to people who will judge you in the first seven seconds. No product succeeds on functional dimensions alone.

A job is stable over time. People have been hiring products to "help me feel confident in professional settings" for centuries. The solutions change, powdered wigs, top hats, power suits, Patagonia vests, but the job persists. This stability is what makes JTBD valuable for long-term product strategy. You are not chasing a trend. You are serving a permanent human motivation.

A job has a circumstance. "Help me get from point A to point B" is not a job. It is a vague aspiration. "Help me get from my apartment to my office 3.2 miles away in under 20 minutes when it's raining and I need to arrive looking presentable", that is a job. The circumstance constrains the solution space and determines which products compete against each other. In this job, a car, an Uber, and a covered bicycle all compete. A skateboard does not.

A job has desired outcomes that can be measured. Ulwick's Outcome-Driven Innovation (ODI) framework operationalizes this by decomposing each job into 50 to 150 specific outcome statements, structured as "minimize the time it takes to [outcome]" or "minimize the likelihood that [negative outcome]." These outcome statements become the variables against which you measure satisfaction and importance.

Anatomy of a Job-to-Be-Done: The Morning Commute Milkshake

Dimension	Description	Example
Functional Job	The core task the customer is trying to complete	Make the commute less boring and keep me full until lunch
Emotional Job	How the customer wants to feel during and after	Feel like I am treating myself; feel energized not sluggish
Social Job	How the customer wants to be perceived by others	Not relevant (solo consumption in car)
Circumstance	The specific situation triggering the job	Driving alone, 30-minute commute, early morning, one hand free
Desired Outcome 1	Measurable success criterion	Minimize the time I feel hungry before lunch
Desired Outcome 2	Measurable success criterion	Minimize the likelihood I spill while driving
Desired Outcome 3	Measurable success criterion	Minimize the boredom during the drive
Competing Solutions	Other products hired for the same job	Banana, bagel, Snickers bar, breakfast burrito, podcast + coffee

The competing solutions row is critical. When you define the market by the job instead of the product category, the competitive set changes entirely. A milkshake does not compete with other milkshakes. It competes with bananas. With bagels. With anything else that can make a morning commute less boring and keep someone full. This reframing is where the strategic value of JTBD lives.

Traditional JTBD Research: Expensive and Slow

Christensen's original method for discovering jobs was the "forces of progress" interview. A trained interviewer spends 45 to 90 minutes with a customer, reconstructing the timeline of their purchase decision. The interview works backward from the moment of purchase, uncovering the specific circumstances, the first thought that a change was needed, the push of dissatisfaction with the current solution, the pull of the new solution, the anxiety of switching, and the habits of the present that resist change.

This method works. It produces deep, accurate understanding of the job. A skilled interviewer can uncover jobs that customers themselves could not have articulated in a survey.

It also costs somewhere between $2,000 and $5,000 per interview when you account for recruiting, scheduling, conducting, transcribing, and analyzing. A proper study requires 30 to 60 interviews to reach theoretical saturation, the point at which new interviews stop producing new jobs. Total cost for a single market: $100,000 to $300,000. Timeline: 8 to 16 weeks.

JTBD Research: Cost and Timeline by Method

For large corporations with R&D budgets in the millions, the cost is manageable. For mid-market companies, startups, and product teams within larger organizations competing for budget, it is prohibitive. Quantifying product-market fit faces the same research cost problem, understanding whether your product satisfies the job well enough to drive retention requires either expensive interviews or scalable measurement systems. The result is predictable: most companies that would benefit from JTBD research never conduct it. They know the theory. They read Christensen's books. They never do the interviews. They default to demographic segmentation because it is cheap and available, not because it is correct.

This is a market failure. The information exists. Customers describe their jobs constantly, in product reviews, in support tickets, in forum posts, in social media threads. They just describe them in natural language, buried in millions of documents, mixed with noise, and scattered across dozens of platforms.

Extracting this information requires NLP. The same language models powering LLM-based catalog enrichment, extracting structured attributes from unstructured text, can be repurposed to extract job statements from customer reviews at scale.

NLP for Automated Job Discovery

The core insight behind NLP-based job discovery is this: when customers write product reviews, they frequently describe the circumstance that triggered the purchase, the job they hired the product to do, and whether the product succeeded or failed at that job. They do this voluntarily, without prompting, and in their own words.

A typical Amazon review for a noise-canceling headphone:

"Bought these for my open-plan office. My coworkers are loud and I couldn't focus on deep work. These block about 80% of the chatter, enough to get into flow state. Battery lasts my whole workday. Only complaint is they get uncomfortable after about 3 hours."

This single review contains: a circumstance (open-plan office), a functional job (block noise to enable deep work), a desired outcome that was met (blocks 80% of chatter, lasts full workday), and a desired outcome that was not met (comfort over extended periods). A JTBD researcher would need a 60-minute interview to extract this. The customer volunteered it in 47 words.

Multiply this by 500,000 reviews across a product category, and you have a dataset that no interview study can match in scale. The question is whether NLP can extract the job-relevant information reliably.

The answer, as of 2025, is yes, with caveats.

The pipeline for NLP-based JTBD research follows a consistent architecture:

Data collection, Scrape or API-access reviews from Amazon, G2, Trustpilot, app stores, Reddit, and category-specific forums. A typical category analysis requires 50,000 to 500,000 reviews.
Preprocessing, Remove duplicates, filter for reviews with sufficient length (>30 words), detect language, handle encoding issues.
Job-relevant sentence extraction, Not every sentence in a review describes a job. A classifier (fine-tuned BERT or a prompted LLM) identifies sentences that describe circumstances, motivations, or outcomes.
Topic modeling, Cluster the job-relevant sentences into coherent themes. Each cluster represents a candidate job.
Sentiment analysis per cluster, Determine satisfaction levels for each discovered job.
Importance estimation, Estimate how important each job is based on frequency of mention and language intensity.
Gap analysis, Map each job on a satisfaction-importance matrix to identify underserved jobs.

Topic Modeling: LDA vs. BERTopic on Review Corpora

Topic modeling is the engine that converts a corpus of job-relevant sentences into discrete, interpretable job clusters. Two approaches dominate: Latent Dirichlet Allocation (LDA) and BERTopic.

LDA (Blei, Ng, & Jordan, 2003) is the older method. It treats each document as a mixture of topics and each topic as a distribution over words. It works well when the corpus is large, the vocabulary is distinct across topics, and the number of topics is specified in advance. For JTBD analysis, LDA's main advantage is interpretability, each topic is a list of high-probability words, which human analysts can map to jobs.

LDA's weaknesses are significant for JTBD work. It relies on bag-of-words representations and misses semantic relationships. "Blocks noise" and "reduces ambient sound" are different word sequences that describe the same job dimension. LDA may split them into separate topics or merge them with unrelated topics that share common words.

BERTopic (Grootendorst, 2022) addresses this by replacing bag-of-words with transformer-based sentence embeddings. The pipeline works in three steps: (1) encode each sentence into a dense vector using a sentence transformer model, (2) reduce dimensionality with UMAP, (3) cluster with HDBSCAN, (4) extract topic representations using c-TF-IDF. Because the clustering operates on semantic embeddings rather than word counts, BERTopic correctly groups "blocks noise" with "reduces ambient sound", they have similar embeddings because they mean similar things. This is the same embedding principle underlying transformer-based product representations, semantic proximity in learned embedding spaces captures meaning that bag-of-words approaches miss entirely. In Step 4, topic representations are extracted via class-based TF-IDF:

\text{c-TF-IDF}(t, c) = \frac{f_{t,c}}{|c|} \cdot \log\left(\frac{N}{n_t}\right)

where $f_{t,c}$ is the frequency of term $t$ in class $c$ , $|c|$ is the total words in class $c$ , $N$ is the total number of documents, and $n_t$ is the number of classes containing term $t$ .

LDA vs. BERTopic for JTBD Discovery: Head-to-Head Comparison

Dimension	LDA	BERTopic
Semantic Understanding	None, bag-of-words only	High, sentence-level embeddings capture meaning
Number of Topics	Must be specified in advance (hyperparameter)	Determined automatically by HDBSCAN clustering
Short Document Performance	Poor, insufficient word co-occurrence signal	Good, embeddings work even on single sentences
Interpretability	High, topic = word probability distribution	Medium, requires c-TF-IDF extraction step
Computational Cost	Low, runs on a laptop for 500K documents	Medium, embedding step requires GPU for speed
Handling of Synonyms	Poor, treats synonyms as different words	Good, synonyms have similar embeddings
Hierarchical Topics	Not natively supported	Supported, can merge or split topics post-hoc
Recommended For	Quick exploration, very large corpora, limited compute	Production JTBD analysis, semantic precision matters

Topic coherence is measured using the normalized pointwise mutual information (NPMI) score, computed as:

C_{NPMI} = \frac{1}{\binom{N}{2}} \sum_{i \lt j} \frac{\log \frac{P(w_i, w_j)}{P(w_i) \, P(w_j)}}{-\log P(w_i, w_j)}

where $w_i$ and $w_j$ are the top- $N$ words in a topic. Higher values indicate more semantically coherent clusters. In practice, BERTopic produces more coherent job clusters on review data. A comparison study by Egger and Yu (2022) on TripAdvisor hotel reviews found that BERTopic produced topics with higher coherence scores (0.62 vs. 0.41 for LDA) and required less manual post-processing to arrive at interpretable job categories.

The typical BERTopic pipeline for JTBD analysis produces 15 to 40 initial clusters from a corpus of 100,000+ job-relevant sentences. An analyst then merges semantically similar clusters and discards noise clusters (reviews about shipping, packaging, or other non-job content), arriving at 8 to 20 distinct jobs per product category.

Here is what a typical output looks like for the noise-canceling headphone category after processing 287,000 reviews:

Jobs Discovered via BERTopic: Noise-Canceling Headphones (287K Reviews)

The ninth and tenth jobs, creating a private space in shared living and protecting hearing in loud workplaces, are ones that rarely surface in traditional JTBD interviews. They represent underserved populations who have adopted a consumer product for a job the manufacturer never intended. NLP discovers these long-tail jobs because it processes hundreds of thousands of voices, including the ones that a 40-person interview study would never recruit.

Sentiment Analysis by Job Dimension

Discovering the jobs is half the work. The other half is measuring how well current products perform each job. This requires sentiment analysis applied at the job-cluster level, not at the product level.

A product's overall star rating is nearly useless for JTBD analysis. A product with a 4.2-star average could be performing one job brilliantly (4.8 stars from those customers) and another job terribly (2.9 stars from those customers). The average hides the signal.

The approach: for each review sentence assigned to a job cluster, run a sentiment classifier to score satisfaction on a -1 to +1 scale. Aggregate by job cluster. The result is a job-level satisfaction score that tells you, for each distinct job customers are hiring the product to do, how well it does that job.

Aspect-based sentiment analysis (ABSA) is the most appropriate NLP technique here. ABSA extracts both the aspect (what the customer is talking about) and the sentiment (how they feel about it) from each sentence. Pre-trained models like PyABSA and InstructABSA, fine-tuned on review data, achieve F1 scores above 0.80 on standard benchmarks.

For JTBD applications, the "aspects" are not product features, they are job dimensions. The noise-canceling headphones example yields results like this:

Satisfaction Score by Job: Noise-Canceling Headphones (Top 5 Products Averaged)

Job	Avg Satisfaction (-1 to +1)	Sample Size	Satisfaction Rank
Enjoy music without external interference	0.72	43,500	1
Block airplane noise during travel	0.68	53,600	2
Focus in open-plan office	0.51	67,100	3
Take work calls in noisy environments	0.34	34,700	4
Study or read without distraction	0.31	21,800	5
Exercise with consistent audio quality	0.28	16,900	6
Sleep in noisy conditions	-0.12	23,800	7
Create a private space in shared living	-0.08	7,200	8
Reduce sensory overload (neurodivergent users)	0.15	13,700	9
Protect hearing in loud work environments	-0.21	4,300	10

Three jobs have negative satisfaction scores. Customers who hire noise-canceling headphones to sleep in noisy conditions, create privacy in shared living, or protect hearing at work are mostly dissatisfied with the product's performance at those jobs. This is precisely the kind of signal that matters for product development and market positioning. These are underserved jobs, real demand with inadequate supply.

The Job Satisfaction Gap Matrix

Satisfaction scores alone do not tell you where to invest. A job that is poorly served but unimportant to the market is a bad bet. A job that is well-served and highly important is table stakes, necessary but undifferentiated. The strategic signal comes from the intersection of importance and satisfaction.

This is the Job Satisfaction Gap Matrix, a framework that maps every discovered job on two axes: importance (how many customers have this job and how intensely they describe it) and satisfaction (how well current solutions perform this job).

Importance is estimated from the review corpus using two signals:

Frequency, What percentage of reviews mention this job? Higher frequency implies a more common job.
Intensity, How emphatically do customers describe this job? NLP intensity scoring measures the use of superlatives, urgency language, and emotional markers. A review saying "I absolutely need these to work on the plane" signals higher importance than "they're nice for flights."

The satisfaction gap index for each job combines both dimensions:

G_j = I_j \times (1 - S_j), \quad \text{where } I_j = f_j \cdot \bar{e}_j

Here $f_j$ is the frequency of job $j$ , $\bar{e}_j$ is the mean intensity score, and $S_j \in [-1, 1]$ is the normalized satisfaction. Jobs with high $G_j$ represent the largest unmet opportunities.

The combined importance score (frequency-weighted by intensity) and the satisfaction score place each job into one of four quadrants:

Quadrant 1, Overserved (High Satisfaction, Low Importance): Current products exceed expectations on a job that few customers care about. Investment here yields diminishing returns. This is where feature bloat lives.

Quadrant 2, Table Stakes (High Satisfaction, High Importance): Current products perform well on important jobs. You must maintain performance here, but differentiation is difficult because competitors have also solved this job.

Quadrant 3, Opportunity (Low Satisfaction, High Importance): Customers care about this job. Current products fail at it. This is where product innovation and marketing messaging should focus. This is the money quadrant.

Quadrant 4, Niche (Low Satisfaction, Low Importance): Current products fail at a job that few customers have. Addressing this job makes sense only if the niche is growing or if the customers who have this job are high-value.

Job Satisfaction Gap Matrix: Noise-Canceling Headphones

The matrix reveals clear strategic priorities. "Office focus" sits in Quadrant 3, the highest importance score (93) with mediocre satisfaction (51). This is the biggest opportunity in the category. "Sleep" combines high importance (63) with negative satisfaction (-12), another Quadrant 3 job that current products badly underserve. "Work calls" has moderate importance (68) with low satisfaction (34), a third Quadrant 3 opportunity.

Meanwhile, "Music enjoyment" sits in Quadrant 2, high satisfaction, moderate importance. This is table stakes. Every product in the category does this well enough. Marketing messages that lead with "incredible sound quality" are fighting over table stakes. Messages that lead with "finally focus in your open office" or "actually sleep on a redeye" are speaking to underserved jobs.

NLP-Discovered Jobs vs. Interview-Discovered Jobs

A reasonable objection: how do we know NLP-discovered jobs are real? What if the topic clusters are artifacts of word frequency rather than genuine customer motivations?

Validation comes from comparing NLP results with traditional interview studies conducted on the same product category. Three published comparisons exist, plus two proprietary studies whose aggregate results have been shared at conferences.

The pattern is consistent. NLP and interviews discover a largely overlapping set of core jobs, typically 70-80% overlap. The differences are instructive in both directions.

Jobs that interviews discover but NLP misses:

Highly emotional or social jobs that customers do not write about in reviews. "I want to look like I take my work seriously" is a social job associated with premium headphones that surfaces in interviews but almost never in reviews. Customers describe functional performance in reviews. They describe social signaling in conversations.
Pre-purchase jobs related to the decision process itself. "Help me choose confidently between options" is a job that shapes purchase behavior but is rarely mentioned in post-purchase reviews.

Jobs that NLP discovers but interviews miss:

Long-tail use cases from populations that JTBD researchers would not recruit. Neurodivergent users hiring noise-canceling headphones for sensory regulation. Night-shift workers using them for daytime sleep. People with misophonia using them at family dinners. These users exist in the review corpus but would not appear in a standard interview recruitment panel.
Negative jobs, things customers are trying to avoid. "Avoid looking antisocial while signaling that I don't want to be interrupted" is a job that NLP detects from clusters of reviews mentioning wearing headphones without audio playing. This job is socially awkward to articulate in an interview but surfaces naturally in anonymous reviews.

Job Discovery Overlap: NLP vs. Interview Methods (Aggregated Across Five Category Studies)

Category	Jobs Found by Both	Jobs Only in Interviews	Jobs Only in NLP	Total Unique Jobs
Noise-Canceling Headphones	7	2	4	13
Smart Home Speakers	6	3	5	14
Project Management Software	9	1	3	13
Electric Toothbrushes	5	2	3	10
Meal Kit Delivery	8	2	4	14

The takeaway is not that NLP is better or worse than interviews. It is that they have complementary blind spots. NLP excels at breadth and long-tail discovery. Interviews excel at emotional depth and social job extraction. The optimal approach, reflected in the hybrid method that discovers the most total jobs, runs NLP first to identify the job landscape at scale, then conducts targeted interviews to deepen understanding of the highest-priority jobs and fill in the emotional and social dimensions that NLP misses.

This hybrid reduces interview cost by 60-70% because you are not spending interviews on discovery. You already know what the jobs are. You are spending interviews on depth.

Multi-Market Job Comparison

One of the most powerful applications of NLP-based JTBD analysis is comparing job structures across markets. Traditional JTBD research in a single market costs $100K-$ 300K. Conducting the same study in five markets costs five times as much. NLP-based analysis scales at near-zero marginal cost, the pipeline is the same, only the input corpus changes.

This enables a question that most product teams have never been able to afford: do customers in different markets hire the same product for the same jobs?

The answer, consistently, is no.

An NLP analysis of 820,000 reviews for robot vacuum cleaners across four markets revealed striking differences:

Top 3 Jobs by Market: Robot Vacuum Cleaners (% of Job-Related Mentions)

Pet hair management is a dominant job in the US (22% of mentions) and nearly nonexistent in Japan (3%). This tracks with pet ownership rates but is not something a US-based product team would think to verify when entering the Japanese market. They might lead their Japanese marketing with a pet hair message, a job that 97% of the Japanese market does not have.

Hygiene standards as a job is four times more important in Germany and Japan than in the US. Marketing messages about "deep cleaning" and "antibacterial" performance would resonate in Munich and Tokyo but feel oddly medical in Memphis.

Time savings, "reduce time spent on housework", is the top job in Brazil (38%) and Germany (32%) but ranks lower in Japan (15%), where the dominant job is maintenance between deep cleans rather than replacing the cleaning itself.

These differences are invisible without cross-market job analysis. And cross-market job analysis was effectively impossible at scale before NLP.

Competitive Job Analysis

The Job Satisfaction Gap Matrix becomes even more powerful when applied comparatively across competitors. Instead of asking "which jobs are underserved in the category?" you ask "which jobs are underserved by Product A but well-served by Product B?"

This produces a competitive job map, a view of where each competitor has job-level advantages and vulnerabilities.

The analytical process: collect reviews for each major competitor in the category, run the same BERTopic pipeline on the combined corpus to discover a shared job taxonomy, then measure satisfaction separately for each competitor on each job.

Competitive Job Satisfaction Map: Noise-Canceling Headphones (Top 4 Competitors)

Job	Sony WH-1000XM5	Bose QC Ultra	Apple AirPods Max	Sennheiser Momentum 4
Office focus	0.58	0.62	0.41	0.43
Airplane travel	0.71	0.74	0.55	0.62
Music enjoyment	0.78	0.65	0.82	0.84
Work calls	0.31	0.45	0.52	0.22
Sleep	-0.15	-0.08	-0.31	-0.18
Exercise	0.18	0.12	-0.05	0.25
Sensory overload	0.22	0.18	0.08	0.11

This table tells specific, actionable stories. Bose leads on the two highest-importance jobs (office focus and airplane travel). Apple leads on music enjoyment (table stakes) and work calls (a real job-level advantage likely driven by Apple device integration). Every product fails the sleep job. Sennheiser has a quiet advantage on music enjoyment and exercise but loses badly on work calls.

If you are Sennheiser, this table tells you: do not compete with Bose on noise cancellation messaging. Compete on the music and exercise jobs where you have a genuine advantage, and invest in microphone quality to close the work-calls gap. If you are a new entrant, the sleep job is wide open, every incumbent is failing it.

From Jobs to Features to Marketing Messages

The job map produces a clear path from research to action. Each underserved job implies both a product feature priority and a marketing message.

The translation follows a formula that Ulwick codified in his ODI framework:

Identify the underserved job, from the Satisfaction Gap Matrix.
Extract the outcome statements, from the NLP corpus, pull the specific language customers use to describe what success and failure look like for this job.
Map outcomes to features, for each unmet outcome, identify the product feature (existing or new) that would address it.
Craft the message, use the customer's own language to describe the job, not yours.

Step 4 is where most marketing teams fail. They translate the job into corporate language. "Our advanced ANC technology provides superior noise isolation for enhanced productivity." No customer has ever described their job this way. They say: "I need to focus when the office is chaos." Or: "My coworkers won't stop talking and I can't think."

NLP-based JTBD research gives you not just the jobs but the exact language customers use to describe them. This language, pulled directly from reviews, becomes the raw material for ad copy, landing pages, and content marketing. It is the voice of the customer at scale, organized by job.

Here is what the translation looks like for the top three underserved jobs in noise-canceling headphones:

Job: Focus in open-plan office

Customer language (from reviews): "drowns out the chatter," "finally get deep work done," "the only way I survive the open office"
Feature implication: ANC tuned for human voice frequency range (not just airplane engine frequencies), comfort for 8+ hour wear, transparency mode that lets specific voices through
Marketing message: "Your open office won't shut up. These will."

Job: Sleep in noisy conditions

Customer language: "tried to sleep with these on," "kept waking up when they fell off," "they press too hard on my ear when I'm on my side"
Feature implication: Ultra-thin profile for side sleeping, soft padding that does not create pressure points, auto-off timer, white noise mode
Marketing message: "Built for the people who can hear everything at 2am."

Job: Take work calls in noisy environments

Customer language: "my team can hear the coffee shop behind me," "the mic picks up everything," "I sound terrible on Zoom"
Feature implication: Beam-forming microphone array, AI voice isolation on outbound audio, real-time background noise suppression for calls
Marketing message: "Sound like you're in a quiet room. Even when you're not."

Building a JTBD-Driven Content Strategy

The job map also restructures content strategy. Instead of organizing content by product feature or marketing funnel stage, you organize it by job.

Each job implies a content cluster: a set of search queries, content topics, and information needs that a person holding that job would have before, during, and after hiring a product.

The "sleep in noisy conditions" job implies content about:

Best noise-canceling headphones for sleep (comparison content)
How to block noise without headphones (educational content that naturally leads to headphone recommendation)
White noise vs. noise cancellation for sleep (consideration-stage content)
Can you sleep with over-ear headphones? (objection-handling content)
Sleep headphones vs. earplugs vs. white noise machines (competitive job framing)

Each of these content pieces targets a customer at a different stage of their journey toward hiring a product for the sleep job. The content is organized around the job, not the product. It speaks the language of the customer's problem, not the language of the product's specification sheet.

This approach produces two measurable advantages:

First, higher conversion rates. Content that matches a customer's specific job converts at 2-4x the rate of generic product content because it demonstrates understanding of the customer's circumstance. The customer thinks: "This is exactly my situation."

Second, lower customer acquisition cost. Job-based keyword targeting captures long-tail search queries that have lower competition and higher purchase intent. "Best headphones for sleeping on planes" has dramatically lower cost-per-click than "best noise canceling headphones" and dramatically higher conversion probability.

A JTBD content matrix organizes this systematically:

JTBD Content Matrix: Noise-Canceling Headphones (Partial)

Job	Awareness Content	Consideration Content	Decision Content	Post-Purchase Content
Office focus	Open office productivity statistics	Headphones vs. office pods vs. WFH	Best ANC headphones for office use	How to set up ANC profiles for your office
Sleep	Why noise ruins sleep (and what to do)	Earplugs vs. white noise vs. ANC for sleep	Best headphones for sleeping	Optimal ANC settings for sleep
Work calls	Why you sound bad on Zoom calls	Headset vs. headphones vs. earbuds for calls	Best headphones for video calls in noisy spaces	Mic settings to improve call quality
Sensory overload	Sensory processing and noise sensitivity	How ANC helps neurodivergent people	Most comfortable headphones for extended wear	Creating sensory-friendly environments at home and work

Each cell is a piece of content. Each row is a job. The content strategy is the matrix. No guessing. No "what should we write about this quarter?" The jobs tell you what to write, and the satisfaction scores tell you which jobs to prioritize.

Limitations of Automated JTBD Research

NLP-based JTBD research has real limitations that are worth being honest about, because overpromising on a method is how methods get discredited.

Limitation 1: Reviews are post-purchase only. Reviews come from people who already bought a product. This means the review corpus is biased toward people who found a solution, not people who searched for a solution and gave up. The unserved market, people with a job for which no current product is adequate, is underrepresented. Interviews can reach these people. Reviews cannot.

Limitation 2: Social and emotional jobs are underrepresented. People write about functional performance in reviews. They rarely write "I bought these headphones because I wanted my coworkers to think I'm the kind of person who owns nice headphones." Social jobs are real and powerful. They are also invisible in review text. NLP captures functional jobs reliably, emotional jobs partially, and social jobs almost never.

Limitation 3: Astroturfing and fake reviews contaminate the corpus. Approximately 30-40% of Amazon reviews in some categories are fraudulent or incentivized. These fake reviews tend to be generic ("great product, works as described") and add noise without adding job signal. Preprocessing filters remove some of them, reviews under 30 words, reviews with suspicious language patterns, reviews from accounts with unusual activity, but contamination persists.

Limitation 4: The method discovers jobs but not their causal structure. NLP can tell you that customers describe a job. It cannot tell you the causal chain that leads from circumstance to job to purchase. It cannot tell you which jobs are actually driving purchase decisions versus which jobs are pleasant side benefits that customers mention but would not pay for. Only experimental methods, A/B testing messages against different jobs, or conjoint analysis, can establish the causal link between job presence and willingness to pay.

Limitation 5: Language and cultural bias. Review language varies across markets not just in language but in style. German reviewers tend toward technical detail. Japanese reviewers tend toward contextual description. American reviewers tend toward superlatives and emotional expression. These cultural differences in review style can bias cross-market job comparisons if not accounted for in the NLP pipeline.

Limitation 6: Topic coherence is not job coherence. A BERTopic cluster that is statistically coherent, meaning the sentences in it are semantically similar, is not necessarily a "job" in the Christensen sense. It might be a feature complaint, a packaging issue, or a customer service grievance. Human judgment is required to map topic clusters to jobs. Automating this mapping is an active area of research (with promising results from LLM-based classification) but is not yet reliable enough to run without human review.

Despite these limitations, the economics are overwhelming. A full NLP-based JTBD analysis costs one-tenth as much as an interview study, runs in one-quarter the time, and discovers jobs that interviews structurally miss. The right question is not "should we use NLP instead of interviews?" It is "why would we ever conduct interviews without running NLP first?"

The future of JTBD research is not qualitative or quantitative. It is both, in sequence. NLP for breadth. Interviews for depth. The job map as the connective tissue between what customers say and what products should do.

References

Christensen, C. M., & Raynor, M. E. (2003). The Innovator's Solution: Creating and Sustaining Successful Growth. Harvard Business School Press.
Christensen, C. M., Dillon, K., Hall, T., & Duncan, D. S. (2016). Competing Against Luck: The Story of Innovation and Customer Choice. Harper Business.
Ulwick, A. W. (2005). What Customers Want: Using Outcome-Driven Innovation to Create Breakthrough Products and Services. McGraw-Hill.
Ulwick, A. W. (2016). Jobs to Be Done: Theory to Practice. IDEA BITE PRESS.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.
Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794.
Egger, R., & Yu, J. (2022). A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts. Frontiers in Sociology, 7, 886498.
Zhang, W., Li, X., Deng, Y., Bing, L., & Lam, W. (2022). A Survey on Aspect-Based Sentiment Analysis. ACM Computing Surveys, 55(1), 1-36.
Fader, P. S., & Hardie, B. G. S. (2005). A Note on Deriving the Conditional PMF of the BG/NBD Model. Working Paper.
Moody, C. E. (2016). Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec. arXiv preprint arXiv:1605.02019.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019, 4171-4186.
Kano, N. (1984). Attractive Quality and Must-Be Quality. Journal of the Japanese Society for Quality Control, 14(2), 39-48.

4 replies

Sarah Mendel2y ago

We ran topic-modeling plus LLM-clustering on ~400K support tickets last year and the most valuable output wasn't the JTBD list itself, it was the 'job frequency x pain severity' matrix. The top-quadrant jobs were obvious. The bottom-quadrant had 3 jobs we'd never built for that represented ~9% of tickets and 4x the average churn rate. That's where the roadmap value lived.

Diego Sánchez1y ago

ill push back on 'interviews miss jobs customers can't articulate'. good interviewing surfaces exactly that, but it requires technique most PMs don't have. NLP on reviews is a wonderful scaling tool but i've seen it produce confidently-wrong job clusters that any interview would have caught as artifacts of review-writing conventions rather than genuine jobs. use both, don't treat NLP as a replacement

Benedikt Roth1y ago

Christensen and Ulwick had a famous disagreement about whether 'jobs' are objective (Ulwick) or interpretive (Christensen). NLP pipelines implicitly assume jobs are objective, extractable from text as latent structures. That's a philosophical commitment worth making explicit because it shapes everything downstream including how you validate the clusters.

Leila Park1y ago

practical note: review data has a massive survivor bias, only users with strong positive or negative experiences write them. silent middle doesn't show up. we found this distorts the 'job importance' weighting by a factor of 2-3x for emotional jobs vs functional ones. worth weighting reviews by user session-volume before clustering.

Join the conversation

Disagree, share a counter-example from your own work, or point at research that changes the picture. Comments are moderated, no account required.