The Topical Authority Audit: Measuring Coverage Without Counting URLs

TL;DR: Topical authority is treated by most SEO programmes as a URL-count problem ("publish 30 articles on the topic") when it is structurally an entity-coverage and semantic-completeness problem. Google's documented systems (the Knowledge Graph, the Hummingbird and BERT updates, the NLP API for entity extraction) read content as a graph of entities and relationships, not as a stack of standalone pages. The right audit treats the topic as a population of entities and sub-topics, scores the site's coverage of that population, and prioritises gaps. Counting URLs is at best a weak proxy and at worst an active misdirection that produces thin pages with overlapping topics and no measurable lift.

A note on tools and brands. Google, Search Engine Journal, Search Engine Land, Moz, SEO by the Sea (Bill Slawski's archive), Holistic SEO (Koray Tuğberk's work), and the major SEO toolkits appear throughout this essay as the available reference sources for documented systems and frameworks. Quantitative claims framed as advisory-engagement observation come from anonymized partner operators, not from the named companies. Public documentation and patent citations are linked inline.

What People Mean by Topical Authority

Topical authority is one of those terms in the SEO vocabulary that means different things to different audiences. To the marketing-tool vendors, it is a domain-level numerical score correlated with rankings on topic-cluster queries. To the editorial-strategy consultants, it is roughly synonymous with "covering a subject deeply." To the SEO patent literature and the public statements from Google's search team, it is closer to the question of whether a site has accumulated enough signals (content, entities, relationships, citations, behavioural evidence) for the search engine's content-classification systems to confidently associate the site with a topic.

The operational point of the term, for someone trying to grow organic visibility, is that it predicts a meaningful share of the variation in which sites rank for which queries within a topic cluster. Sites that the search system reads as topically authoritative tend to outrank otherwise-comparable sites on long-tail informational queries, on emerging sub-topics within the cluster, and on queries where the search system has to make a judgement call about which source is more likely to be reliable.

The widespread interpretation of this as a URL-count problem is a category error. The Hummingbird update of 2013 reframed Google's query understanding from keyword matching to entity matching; the BERT update of 2019 reframed Google's contextual understanding from bag-of-words to bidirectional contextual embeddings; the multitask unified model (MUM) of 2021 added cross-language and multimodal capability. All three updates have moved Google's content understanding further from a per-page keyword model toward a per-topic entity-relationship model. URLs are units of crawling and indexing; entities are units of meaning. The two are related but not interchangeable.

The Entity Model Behind Modern Search

To audit topical authority by coverage, the audit needs an explicit model of what counts as coverage. The model that maps best to how Google's documented systems operate is an entity-relationship model derived from the published patent and NLP literature.

The Knowledge Graph, announced by Google in 2012 and substantially expanded since, is the public-facing version of an entity store. Each entity (a person, a place, a concept, an organisation, a product, a procedure) has an identifier, a set of attributes, and a set of relationships to other entities. The Knowledge Graph is one of several entity stores Google operates; the broader system includes various crawl-time entity extractors and the NLP API that Google offers publicly via Google Cloud, which exposes a salience-scored entity extraction interface.

Bill Slawski's archive at SEO by the Sea, before his death in 2022, was the most thorough public reading of Google's patent filings on entity-based ranking. The patents he tracked included entity disambiguation systems, entity-salience scoring, query interpretation as entity resolution, and the various follow-on systems that use entities as features in ranking. His core observation, repeated across many essays, was that Google's published patents had been describing entity-based retrieval for at least a decade before practitioners started taking the idea seriously, and that "topical authority" in any meaningful sense was already an entity-coverage question rather than a keyword-density question.

The entity-based SEO frameworks developed by practitioners like Koray Tuğberk GÜBÜR, Cyrus Shepard, and Aleyda Solis since the late 2010s have built on this foundation. The Koray framework in particular operationalised entity coverage as a planning input: identify the topic's entity population, score the site's coverage of each, build out the missing pieces. The framework is structurally compatible with how Google's documented systems work, which is the reason it has produced repeatable results for the operators who have adopted it.

The entity model has implications for what the audit measures. A site whose content covers 200 of the 300 entities in a topical neighbourhood, with each entity covered to a reasonable depth and with sensible interlinking, is more topically authoritative than a site with 600 URLs that redundantly cover the same 50 entities. The first site is structurally legible to the entity-based ranking systems; the second is a noise emitter.

Building the Entity Population

The first step in an entity-coverage audit is to enumerate the topic's entity population. This is the hardest part of the audit, and the part that vendors most often skip. The published taxonomies and ontologies for most subjects are incomplete; the practitioner techniques for building the entity list combine several inputs.

The first input is the Wikipedia cluster for the topic. Wikipedia's category trees and the structured property data behind them are not complete representations of any topic, but they are useful starting points for canonical entities and their canonical attributes. For a topic like "marketing attribution," the Wikipedia entry's "See also" section, the linked-from articles, and the underlying entity graph collectively name the canonical sub-entities (multi-touch attribution, marketing mix modelling, last-click attribution, incrementality testing, ad attribution, conversion tracking) that any topically authoritative site would be expected to cover.

The second input is Google's own entity surface. The Knowledge Panel for a head term, the People Also Ask box for representative queries, the related-search recommendations at the bottom of the SERP, and the entities that appear in Google's NLP API extraction on the top-ranking pages collectively reveal the entity neighbourhood that Google associates with the topic. The People Also Ask surface in particular is a useful expansion mechanism: each PAA answer tends to reveal an entity or sub-topic that Google considers adjacent to the queried entity, and clicking through expands the tree.

The third input is the competitive content audit. The top-ranking sites for the head term and for the immediate variants collectively cover the entities that the search system has found sufficient for ranking on that cluster. A topical audit that fails to cover entities that all the top-ranking competitors cover is a topical audit with structural gaps, regardless of how many URLs the site has on the topic.

The fourth input is the practitioner literature. Industry conferences, academic syllabi, books on the topic, and the major trade publications collectively name entities that the operating community treats as the canonical sub-topics. The intersection of the four inputs produces a working entity list that is rarely complete but is usually thorough enough to drive a useful audit.

Sources for Topic Entity Population Enumeration

Source	What It Reveals	Strengths	Limitations
Wikipedia	Canonical entities and their relationships	Stable, well-curated, machine-readable	Coverage uneven across topics; lags emerging sub-topics
Google Knowledge Panel and PAA	Entity neighbourhood Google associates with topic	Reflects current Google representation	Locale-dependent; not comprehensive for niche topics
Top-ranking competitor content	Entities the search system has found sufficient for ranking	Reflects what actually works in current SERPs	Reproduces competitor blind spots; competitors may be incomplete too
Practitioner literature and conferences	Sub-topics the operating community treats as canonical	Captures emerging entities before Google does	Subjective; reflects practitioner bias toward novelty
Google NLP API entity extraction	Programmatic entity extraction from competitor pages	Scales; gives salience scores	API extraction is not identical to ranking-time entity model
Search Console query data	Queries the site already ranks for, including unexpected long-tail	Reflects actual user behaviour	Conditional on what the site already covers; misses entities the site has no coverage on

The entity list is iterative. The first pass produces a rough enumeration; the audit refines it as it runs. The discipline that matters is to treat the list as a topic population to be sampled and covered, rather than as a content brief to be checked off.

Scoring Coverage

Once the entity population is enumerated, the audit scores the site's coverage of each entity. The scoring rubric has four dimensions: presence (does the site have content addressing the entity at all), depth (is the coverage substantive enough to be ranking-competitive), prominence (is the coverage in a structurally appropriate page or buried in a tangential one), and freshness (is the coverage current enough that the site is not visibly behind on the topic).

Presence is binary or near-binary: the entity is named on at least one indexed page on the site, or it is not. Search Console's coverage data and a simple site-search query are sufficient for this dimension.

Depth is gradient. A passing mention of an entity in one paragraph of one article is shallow coverage. A dedicated section of 400 to 800 words is medium coverage. A dedicated article or pillar page with 2,000 to 5,000 words of substantive treatment is deep coverage. Different entities warrant different depths: the canonical head entities of the topic warrant deep coverage, the secondary entities warrant medium coverage, the tertiary entities can be adequately addressed with shallow coverage that links to a stronger source.

Prominence is structural. An entity covered in a dedicated page that is reached in two clicks from the homepage and that is linked-to from related pages on the site is structurally prominent. An entity covered in a page that is buried in the archive, has no internal inbound links, and is reached only through search is structurally invisible. Prominence affects whether the search system reads the coverage as a strong topical signal or as orphaned content.

Freshness is the time dimension. The freshness signal that matters for topical authority is not whether the page has a recent date stamp; it is whether the coverage reflects the current state of the topic. An entity that has evolved since the page was written, whose canonical definition or canonical implementation has changed, requires updated coverage to maintain the topical signal.

Distribution of Entity Coverage in a Sample Audit (Practitioner Estimate)

The shape of the distribution tells the audit's story. A site with most head entities in the deep-coverage bucket and most tertiary entities in shallow or no coverage is healthy and operating efficiently. A site with head entities in shallow or missing coverage and many tertiary entities in medium coverage is upside-down: it has spent editorial budget on the less important entities and left the load-bearing parts of the topic uncovered. The corrective action is to consolidate or retire the over-built tertiary coverage and to invest in the missing head entities.

Topical Visibility Trajectory: Entity-First vs. URL-Count Programmes (Practitioner Estimate)

The trajectory pattern that recurs across partner engagements is consistent: URL-count programmes show faster early gains (the long-tail rankings come in quickly when the site is publishing volume), then plateau between months 9 and 15 as the search system fails to reward additional URLs in the absence of structural authority. Entity-first programmes underperform in the first 6 months because the head-entity work is heavier and ships slower; they begin to pull ahead between months 9 and 12; by month 18 to 24 the gap is meaningful and widening. The compounding mechanism is that head-entity authority makes long-tail rankings cheaper, while URL-count programmes have to keep publishing forever to maintain the same level of visibility.

Semantic Completeness Within a Page

Coverage at the site level is one half of the audit. Semantic completeness within a page is the other half. A page that addresses an entity but addresses it incompletely (missing the key attributes, missing the canonical relationships, missing the questions that users canonically ask about the entity) does not establish topical authority for the entity even if the page is long and well-written.

The semantic-completeness audit, for any given page, asks four questions. First, does the page cover the canonical attributes of its primary entity? For a page about "marketing mix modelling," the canonical attributes include the modelling techniques used, the data inputs required, the typical accuracy ranges, the strengths and weaknesses relative to alternatives, the implementation complexity, and the typical organisational contexts in which it is deployed.

Second, does the page cover the canonical questions users ask about the entity? The People Also Ask surface, the related-search surface, and the Quora and Reddit discussions on the entity collectively reveal the question population. A page that addresses 80 percent of the canonical questions is closer to topically complete than a page that addresses 20 percent of them, all else equal.

Third, does the page connect the entity to its relational neighbours? An entity exists in a relational graph: marketing mix modelling relates to multi-touch attribution, to incrementality testing, to causal inference, to media planning, to budget allocation. A page about MMM that does not mention these adjacent entities is structurally orphaned in the knowledge graph; a page that names them and links to dedicated coverage of each is structurally integrated.

Fourth, does the page surface the canonical examples and edge cases that practitioners would recognise as evidence of expertise? A page on dynamic pricing that does not mention the canonical case examples (airlines, ride-sharing, e-commerce price-personalisation), the canonical methods (auction, posted-price, contextual bandit), and the canonical edge cases (price fairness audits, regulatory exposure, customer-segment effects) is not topically authoritative regardless of how many words it contains.

The semantic-completeness check, applied to the head-entity pages, is often where the highest-leverage interventions live. A head-entity page that is missing 40 percent of its canonical attributes, half of its canonical question coverage, and most of its relational connections can be lifted from a mediocre ranker to a strong ranker by a substantive expansion that addresses the gaps. The intervention is heavier than a freshness pass but lighter than a new build, and the success rate in partner data is high because the diagnostic is precise.

Semantic-Completeness Rubric for Head-Entity Pages

Dimension	Specification	Failure Mode	Operational Check
Attribute coverage	Page addresses 70-90% of the entity’s canonical attributes	Page reads as marketing fluff with no operational substance	Compare page section list to the canonical attribute checklist; flag missing attributes
Question coverage	Page answers the 8-15 canonical questions users ask about the entity	Page treats one or two questions in depth and ignores the rest	Pull People Also Ask and related-search for representative queries; check page against the question set
Relational connection	Page names and links to 6-12 directly adjacent entities in the topic graph	Page is a topical island with no internal inbound or outbound links	Diagram the topic graph; verify page-to-page links match the relational structure
Canonical examples and edge cases	Page surfaces the 3-7 examples and edge cases practitioners would recognise as expertise signal	Page reads as generic explainer; expert readers do not engage	Survey expert-written competitor pages; verify the canonical example set is covered
Author and citation signals	Page surfaces author expertise and cites authoritative sources for non-obvious claims	Page makes confident assertions with no provenance; expert readers discount the claims	Audit citation density and author bio; verify links to canonical sources

The rubric is deliberately rigorous. Most pages in most topical audits fail the rubric on two or three dimensions, and the editorial work that fixes them is substantive (typically 8 to 25 hours per page for a serious head-entity expansion). The investment is justified by the compounding visibility lift: a head-entity page that passes the rubric tends to anchor a topical cluster's rankings for years, while a head-entity page that fails the rubric tends to drift down in rankings over the same period as competitors catch up and surpass it.

The Audit Workflow

The audit, as a standing practice, has a defined workflow. The workflow below is the version that has produced consistent results across advisory partner engagements.

Step one is the topic definition. The topic is the unit of analysis: not a single keyword, not a single page, but a coherent cluster of related queries and entities. The topic boundary is a judgement call; a working heuristic is to draw the boundary where users would expect a different source to be authoritative. "Email marketing" is a topic; "the deliverability sub-topic within email marketing" is a sub-topic that may warrant its own audit if the operator competes meaningfully on it.

Step two is the entity enumeration described in the prior section: combine the Wikipedia, Google surface, competitor, and practitioner-literature inputs into a working list of 150 to 400 entities, classified by head, secondary, and tertiary.

Step three is the coverage scoring: for each entity, score the site's coverage on the four dimensions (presence, depth, prominence, freshness). This step is labour-intensive on the first pass (typically 0.5 to 2.0 hours per 100 entities) and faster on subsequent passes when the spreadsheet structure is in place.

Step four is the semantic-completeness check on the head-entity pages. For each head-entity page, score the four sub-dimensions (attributes, questions, relations, examples) and identify the specific gaps.

Step five is the gap prioritisation. Combine the entity-level and page-level gaps into a ranked queue, weighted by entity importance (head versus tertiary), gap severity (missing coverage versus shallow coverage), strategic value (commercial proximity, conversion path), and editorial effort.

Step six is the production queue. Convert the ranked gap list into specific briefs: entity X requires a new pillar page of approximately Y words; head-entity page Z requires an expansion of approximately N words covering attributes A, B, and C; tertiary entities P, Q, and R can be consolidated into a single overview page rather than getting individual pages.

Topical authority audit workflow

Loading diagram...

The workflow is iterative. The first pass produces a rough map; the second pass, six to twelve months later, refines the entity list (Google adds entities to the topic surface as the topic evolves), updates the coverage scores, and re-prioritises. Topical authority is a moving target because topics evolve; the audit is a standing practice rather than a one-time project.

The labour distribution in a first-pass audit on a topic with ~200 entities is roughly: entity enumeration 6 to 14 hours, coverage scoring 10 to 30 hours, semantic-completeness check on 20 to 40 head-entity pages 30 to 80 hours, prioritisation and brief writing 8 to 20 hours. The full first pass typically takes 60 to 150 analyst hours depending on the topic's breadth and the existing CMS structure's legibility. Subsequent passes drop to 20 to 50 hours because the entity list, scoring template, and brief structure are already in place. The cost is non-trivial, and it is recouped many times over when the production queue is correctly prioritised rather than driven by what felt urgent in last week's editorial meeting.

Common Failure Modes

A small number of failure modes recur across audits we have seen in advisory work. Naming them explicitly helps avoid them.

The first failure is treating the URL list as the entity list. Operators sort their content management system by category or tag, count the URLs in each, and conclude that the topic with 47 URLs is well-covered. The URL count is not the coverage measure. Two of those 47 URLs may be the only ones that cover load-bearing head entities; the other 45 may overlap on a small number of secondary entities. The audit has to look at the entity population, not the URL list.

The second failure is over-investment in the long tail at the expense of the head. The CMS dashboard and the SEO toolkit's keyword tracker both reward writing about long-tail terms because the long-tail terms have low competition and the operator can rank quickly. The compounding consequence is that the long-tail expansion happens before the head-entity foundations are strong, and the long-tail rankings stall because the search system does not yet read the site as topically authoritative. The right order is head entities first, secondary entities second, long-tail expansion third, on the basis that the long-tail benefits from the topical authority that head and secondary coverage establish.

The third failure is the missing relational connection. A site can have substantive coverage of 200 entities on a topic with essentially no internal linking between them. Each page is a topical island; the search system can read individual pages as substantive but cannot read the site as a coherent topical authority because the graph structure is absent. The fix is internal linking that reflects the entity relationships: a page on multi-touch attribution should link to causal inference, to incrementality testing, to last-click attribution, to media mix modelling, with anchor text that names the entity rather than generic "click here" language.

The fourth failure is the cannibalisation cluster. The site has six articles whose primary entity is the same (the canonical "what is X" article exists in six versions, written at different times by different writers with no consolidation). The search system selects one of them to rank and ignores the others; the others split the link equity and dilute the topical signal. The fix is consolidation: pick the strongest version, fold the unique content from the others into it, redirect the rest to it.

The fifth failure is the over-formal entity list. Operators who treat the entity list as a sacred ontology become precious about additions and changes, and the entity list becomes outdated. The entity list is a working document; new sub-topics get added as the topic evolves, deprecated sub-topics get removed, and the list reflects the current state of the topic rather than a historical snapshot.

From Audit to Production

The audit produces a ranked queue of interventions: new pillar pages for missing head entities, expansion briefs for under-developed head-entity pages, consolidation moves for cannibalisation clusters, and internal linking improvements that connect the entity graph. Converting the queue to production has its own discipline.

The conversion ratio that has worked in partner engagements is to take the top 40 to 60 items from the audit, scope each as a specific brief, and ship them over a six to nine month editorial horizon. Larger queues tend to outpace editorial capacity and become aspirational documents; smaller queues miss the compounding benefit of substantial coverage build-out.

The briefs should be specific to the gap. A brief for a missing head-entity pillar page specifies the entity, the canonical attributes the page must cover, the related entities to link to, the question population to address, and the depth target (typically 2,500 to 6,000 words for a head entity). A brief for an expansion specifies the existing page, the specific sub-sections to add, and the entities to introduce. A brief for a consolidation specifies the source pages, the canonical destination, the unique content to migrate, and the redirect mapping.

The production should ship in batches rather than serially. A batch of 8 to 15 pieces shipped over a one to two month window establishes a topical signal of activity that the search system reads as freshness and depth; a slow drip of one or two pieces per month produces the same total output with less observable signal. The editorial discipline that matters is to maintain quality at scale, which is harder than it sounds and is the place where most topical-authority build-outs founder.

The measurement of programme success is at the topic level rather than the URL level: aggregate impressions, clicks, and visibility on the topic cluster as a whole, tracked over rolling six-month windows. URL-level metrics are useful for tactical decisions but not for assessing whether the topical-authority programme is working. The right metric is whether the site's overall topical visibility (against a defined competitor set, on a defined query universe) is improving.

Topical authority is a graph property, not a list property. Sites that audit their content as a graph of entities and relationships build authority that compounds; sites that audit their content as a list of URLs tend to produce thin pages and stalled growth.

The compounding mechanism is structural. A site that establishes deep head-entity coverage in the first six months earns the topical signal that lets the next six months' long-tail expansion rank faster than it would have otherwise. A site that establishes interconnected secondary-entity coverage in the second six months earns the relational structure that makes the search system read the site as the canonical destination for the topic. The programme that runs the audit, ships the prioritised interventions, and measures topical visibility over rolling windows tends to produce results in the 18 to 36 month horizon that no amount of unstructured publishing can match.

Measuring Topical Visibility

The right metric for topical authority is hard to build well and most operators settle for the wrong one. Aggregate organic traffic is a noisy measure because it conflates topical authority with seasonality, with brand demand, and with the ranking volatility of any single high-value query. URL-level rankings on a small set of head terms are misleading because they ignore the long tail where the topical-authority effect is largest. The metric that approximates the underlying construct most closely is a query-universe visibility index against a defined competitor set.

The construction is straightforward in principle and labour-intensive in practice. Define a query universe of 500 to 2,000 queries that span the topic (head terms, secondary terms, long-tail variants, question-form queries). Define a competitor set of 3 to 8 sites that compete on the topic. Track ranking position on every query in the universe across the operator site and the competitor set on a weekly or biweekly cadence. Aggregate to a visibility score (the Sistrix Visibility Index, the Semrush Visibility, or a custom equivalent that weights queries by search volume and rank-CTR curve) and track the score over time.

The visibility-against-competitors framing is what makes the metric diagnostic rather than descriptive. A flat visibility line during a year when competitors all dropped 25 percent is a topical-authority win; a flat visibility line during a year when competitors all gained 40 percent is a topical-authority loss. The aggregate-traffic chart cannot distinguish these two cases; the competitive visibility index can.

A practical implementation detail: the query universe should be defined once and then largely held constant for the metric to be comparable across time. Adding queries because the site started ranking for them, or removing queries because the site stopped ranking, both contaminate the metric. The discipline is to define the universe based on the topic itself (what queries should a topically authoritative site rank for?) and to update the universe periodically (annually, typically) to reflect topic evolution rather than continuously to reflect the operator site's current rankings.

The Sistrix Visibility Index, the SimilarWeb traffic share metrics, the Ahrefs site-level keyword visibility, and the Semrush domain-level visibility scores are all useful approximations that share the basic shape of the construct. Each has known limitations (different SERP feature handling, different rank-tracking depth, different geographic emphasis), and the right operating posture is to track two or three of them in parallel rather than to treat any one as ground truth. The directional agreement across the indices is the signal; the per-index level is noisier than it appears.

Key Takeaways

Topical authority is an entity-coverage and semantic-completeness problem, not a URL-count problem. Counting URLs as a proxy for authority is at best uninformative and at worst actively misleading.
The entity population for a topic is built from four inputs in combination: Wikipedia, Google's own entity surface (Knowledge Panel, People Also Ask, related searches), top-ranking competitor content, and the practitioner literature. The intersection produces a working list of 150 to 400 entities classified as head, secondary, or tertiary.
The coverage audit scores each entity on four dimensions (presence, depth, prominence, freshness), and the semantic-completeness audit scores each head-entity page on four sub-dimensions (attributes, questions, relations, examples). Both layers are required.
The audit workflow is iterative: define topic, enumerate entities, score coverage, check semantic completeness, prioritise gaps, ship the production queue. The first pass produces a rough map; subsequent passes refine as the topic evolves.
The recurring failure modes include treating the URL list as the entity list, over-investing in the long tail before head entities are strong, leaving the entity graph unlinked, accumulating cannibalisation clusters, and treating the entity list as a sacred ontology rather than a working document.
The production discipline is to ship 40 to 60 items from the audit over a six to nine month horizon in batches of 8 to 15, with head-entity pillar pages shipped first. Programmes that ship long-tail expansion before head-entity coverage tend to stall.
Measure programme success at the topic level (aggregate visibility against a competitor set on a query universe), not the URL level. Topical authority compounds over 18 to 36 months and is the structural prerequisite for the long-tail rankings that most operators chase first.