Internal Linking Architecture for Content Moats: Beyond Hub-and-Spoke

TL;DR: The hub-and-spoke pattern (one pillar page linked bidirectionally to a cluster of supporting articles) is a useful starter architecture for content properties under roughly two hundred pages, after which it produces flat link distributions, weak page-to-page lateral linking, and a graph that fails to consolidate topical authority above the spoke level. Mature content moats use a deeper architecture: layered hub topologies with cross-cluster bridges, contextual mid-body linking on a budget per page, anchor-text portfolios that vary by ranking goal, and an explicit measurement loop on internal PageRank-equivalent centrality. The mechanical work is in modeling the graph as a graph, not as a content taxonomy.

A note on the named companies and sources. Google's PageRank documentation, the original Brin and Page 1998 paper, Bill Slawski's published patent analysis (Go Fish Digital, archived), Cyrus Shepard and Moz's internal-link research, Kevin Indig's published work on topical authority, and the Reasonable Surfer patent literature appear throughout as available public reference points. Quantitative ranges framed as advisory-engagement observation come from anonymized partner content operators in the 500 to 80,000 indexed-page range, across publishing, SaaS, and e-commerce verticals.

Where Hub-and-Spoke Comes From, and Where It Breaks

The hub-and-spoke pattern entered the content SEO vocabulary around 2017 to 2019 as the "topic cluster" framework, promoted heavily by HubSpot's marketing team and reinforced by subsequent agency content. The pattern is straightforward: one comprehensive "pillar" page targets the head term for a topic; a cluster of supporting articles targets the long-tail variants; the pillar links down to every supporting article, every supporting article links back up to the pillar. The result is a star graph with the pillar at the center.

The pattern works for a specific reason: it consolidates internal link equity on the pillar (every spoke contributes an inbound link), and it semantically clusters the supporting content (every spoke is one hop from the pillar, every spoke is two hops from every sibling spoke through the pillar). For a small-to-medium content property in a single topic, this is sufficient. The pillar accumulates ranking signal; the spokes capture long-tail traffic; the user's path through the content is legible.

The pattern fails in three predictable ways past roughly the two-hundred-page mark. First, the pillar pages become congested: a pillar with sixty spokes linking back is so densely linked that the inbound-link signal gets diluted, and the marginal sixty-first spoke contributes essentially no additional ranking value to the pillar. Second, the spoke-to-spoke distance is always two hops through the pillar, which is too far for lateral signal flow and creates an obvious "two-hop tax" on related-but-not-identical queries. Third, the architecture has no slot for cross-cluster relationships: a property covering "B2B SaaS pricing" and "B2B SaaS onboarding" as two separate clusters has no way to express that the two clusters share readership, share keywords, and should flow signal to each other.

Contrary to the Conventional View

Conventional view

Hub-and-spoke topic clusters are the right architecture for topical authority.

What the evidence shows

Hub-and-spoke is a starter pattern, not a target pattern. In partner data on content properties that scaled from 50 to 5,000 indexed pages, the hub-and-spoke implementations stopped delivering marginal ranking lift somewhere between 180 and 320 pages, after which the pillar pages plateaued and the new spokes increasingly failed to rank at all. The properties that continued to compound ranking improvements past that mark had restructured to layered hub topologies with explicit cross-cluster bridges, contextual mid-body linking on every page, and a deliberate anchor-text portfolio strategy. The HubSpot-popularized framing is correct as a phase 1 architecture and wrong as a phase 2 architecture.

PageRank as a Mental Model, Not a Recipe

The mental model that produces useful internal-linking decisions is the PageRank one, not the topic-cluster one. PageRank, as Brin and Page formulated it in 1998 and as Google internally evolved it through Reasonable Surfer and subsequent refinements, treats link equity as a flow problem: each page distributes its accumulated equity across its outbound links, with damping, and the equilibrium distribution is the PageRank vector. The original paper used a uniform damping factor; the Reasonable Surfer patent extended the model to weight links by their probability of being clicked, which depends on link position, link prominence, anchor text relevance, and so on.

The practical implication of the PageRank framing is that a page's link equity is not just a function of how many pages link to it; it is a function of the equity of the pages that link to it, divided by their outbound link counts. A link from a page with high equity and few outbound links is worth more than a link from a low-equity page with many outbound links. A page that has accumulated equity from external backlinks distributes that equity across its outbound internal links; the architecture decision is which pages get those outbound links.

Google has stated publicly, through Gary Illyes and others, that internal PageRank (or its modern equivalent) remains a ranking input. The exact algorithm has evolved beyond the 1998 paper (Reasonable Surfer adds click-probability weights; subsequent refinements add quality and trust signals; the link-graph is now one input to a much larger neural ranking system), but the mental model of "equity flows through links" is still operationally useful. The architecture work is to direct the equity flow toward the pages the operator wants to rank.

The mistake the topic-cluster framing makes is to treat the link decisions as a content-taxonomy problem rather than a flow-distribution problem. The two framings produce different architectures. The taxonomy framing asks "what does this content cluster look like?" and produces a hub-and-spoke pattern that mirrors the table of contents. The flow framing asks "where do I want the equity to land, and how do I route the links to get it there?" and produces a deeper, more directional architecture that does not necessarily mirror the content hierarchy.

The Layered Hub Topology

The architecture that has held up across partner engagements as content properties pass two hundred pages is what we have come to call a layered hub topology. The structure has three or four levels of hub pages, each level linking down to the level below and up to the level above, with explicit lateral bridges across hubs at the same level. The visual is closer to a directed acyclic graph than to a star.

Layered hub topology, three levels with cross-cluster bridges (advisory illustration)

Loading diagram...

The three properties of this topology that distinguish it from hub-and-spoke are the multiple hub levels, the explicit cross-cluster bridges at each level, and the leaf-to-leaf lateral links that connect semantically related content across the hub hierarchy. Each property changes the flow distribution in a specific way.

Multiple hub levels distribute link equity more evenly across the hub hierarchy. In a flat hub-and-spoke, all the spoke equity flows to one pillar, which then has to distribute outbound equity across many spokes; the pillar becomes a bottleneck. In a layered hub, equity flows from the leaves to the L2 hubs, which consolidate and pass equity to the L1 hubs, which consolidate and pass equity to the root. The path is longer but the consolidation is more progressive, and the L2 hubs accumulate enough equity to rank for medium-tail terms that no individual leaf could rank for.

Cross-cluster bridges connect related hubs at the same level. A bridge from "B2B SaaS pricing" to "SaaS marketing" hub passes equity across what would otherwise be siloed clusters, and it allows the two hubs to flow signal to each other on queries that span both topics. The bridges are not arbitrary; they are placed where the two clusters share readership, share keyword overlap, or share user journeys. Excessive bridging dilutes the signal; selective bridging compounds it.

Leaf-to-leaf lateral links connect specific articles across the hub hierarchy. A "CTR curve" leaf under the SEO hub might link laterally to an "anchor pricing" leaf under the B2B pricing hub if the two articles are conceptually related (both deal with reference-effect-driven user behavior). The lateral links bypass the hub hierarchy and create direct flow paths between related content. They are the lateral plumbing that the hub-and-spoke pattern explicitly lacks.

The In-Link Distribution Per Page

The single most useful diagnostic for an internal-linking architecture is the in-link distribution per page. A healthy site has an asymmetric distribution: a small number of pages with many inbound links (the hubs), a larger number with moderate inbound link counts (the L2 and L3 pages), and a tail of pages with few inbound links (the deep leaves). An unhealthy site has either a flat distribution (every page has roughly the same in-link count, suggesting an automated linking layer with no curation) or a bimodal distribution (a few pages with hundreds of links, everything else with under three).

Internal In-Link Distribution Patterns and What They Suggest (Across Advisory Partner Content Properties)

Pattern	Page count over 100 in-links	Median in-links	P90 in-links	Interpretation
Healthy layered hub	8 to 14% of L1+L2 hubs	5 to 9	23 to 38	Asymmetric, consolidated at hubs, well-served leaves
Hub-and-spoke (matured past 200 pages)	1 to 3% (pillar pages only)	2	11	Bimodal: pillars saturated, spokes thin
Flat automated linking	Under 1%	12 to 18	24	Suspicious flatness, tag-based widget dominating
Deep but disconnected silos	3 to 6%	3 to 4	7 to 11	Hubs exist but no cross-cluster flow
No architecture at all (legacy blog)	Under 1%	0.84	3.4	Editorial cross-links only, no design
Over-linked (mid-body excess)	12 to 18%	27	84	Anchor-text dilution, signal noise

The diagnostic value of this table is in its contrast. A site looking healthy on traffic but showing a flat in-link distribution is at risk: the automated linking layer is doing the work, the curated architecture is not, and the moat is shallow. A site showing the over-linked pattern (every page with 27 median in-links, hundreds at the high end) is paying an anchor-text-dilution tax that the operator may not have noticed. A site showing the legacy-blog pattern has nearly all its ranking potential on the table.

The remediation for each pattern is different. The flat automated pattern needs curated cross-links added to mid-body content. The hub-and-spoke pattern needs an L2 layer added between the pillars and the spokes. The disconnected-silos pattern needs cross-cluster bridges. The legacy-blog pattern needs an architecture imposed retroactively, which is the largest project of the four. The over-linked pattern needs pruning, which is politically harder than adding (every team that owns a page has a reason to link to it from every other page).

Anchor Text as a Portfolio Problem

The second under-appreciated dimension of internal-linking architecture is the anchor text portfolio. A page that consistently receives the same exact-match anchor text from every inbound internal link is sending a strong (and potentially over-strong) topical signal; a page that receives a varied anchor portfolio sends a more natural signal that maps to multiple related queries.

The mechanism is two-sided. On the one hand, exact-match anchors tell Google what the destination page is about, and a consistent anchor portfolio makes the topical claim unambiguous. On the other hand, an exclusively exact-match portfolio looks unnatural (real editorial linking varies the anchor by context) and may be down-weighted by the anti-manipulation systems. The Penguin algorithm and its successors target manipulative anchor patterns; what is or is not manipulative is a graduated judgment, not a binary one, but the gradient runs from "natural variation" toward "manipulative concentration" as the exact-match share rises.

In advisory work across partner engagements, the anchor portfolios that have produced the most stable ranking outcomes cluster around the following distributions on internal links to a target page: 18 to 32 percent exact-match (using the target keyword verbatim), 22 to 38 percent partial-match (using the keyword plus modifiers), 18 to 28 percent branded (using the site name, page title, or related branding), 8 to 14 percent generic ("learn more", "see this guide"), and the rest as descriptive contextual phrases that mention the topic without keyword precision. The distribution is not a recipe; it varies by page type and by competitive context.

Internal Anchor Text Distribution for Target Pages That Sustained Position 1 to 3 Rankings (Across Advisory Partner Operators)

The exact-match share is the variable most worth tuning. A page with an exact-match share above roughly 45 percent on internal anchors is in a portfolio shape that has, in our experience, produced ranking instability under algorithm updates; the page ranks well between updates but is repeatedly hit when the anti-manipulation models recalibrate. A page with an exact-match share below 12 percent is leaving signal on the table and would likely rank higher with a few targeted exact-match anchors added. The 18 to 32 percent range is the middle band that has been most resilient.

The Reasonable Surfer Model and Link Position

The Reasonable Surfer patent (Google, 2010) extended the basic PageRank model with a click-probability weight per link. The conceptual claim is that not every link on a page is equally likely to be clicked by a real user; links higher on the page, links with more prominent visual treatment, links surrounded by relevant context, links with anchor text that matches the user's intent, and links in editorial body content are more likely to be clicked than links in the footer, the sidebar, or buried in a list of fifty other links. The Reasonable Surfer model weights link equity by the click probability, so a high-click-probability link transmits more equity than a low-click-probability one.

The operational implication is that link position matters. A contextual mid-body link in editorial content carries more signal than a footer link, which carries more signal than a sidebar widget link. The original 2010 patent is over a decade old, but Google has consistently treated link prominence and position as inputs to ranking, and the published work from Bill Slawski (until his passing in 2022) tracked the patent literature in detail.

The architecture decision that follows is to allocate the mid-body editorial-context links carefully. Each article has a finite budget of body links it can carry without becoming a link farm; the empirical sweet spot in partner data sits at roughly 4 to 11 contextual mid-body links per long-form article (1,500 to 5,000 word range), with the count rising sub-linearly with article length. Pages with mid-body link counts above 20 to 25 show diminishing per-link signal and increasing anchor-text dilution. The discipline is in the editorial choice of which links earn the mid-body slot, with the rest pushed to "related articles" blocks where the click probability and signal weight are both lower.

The Reasonable Surfer model also implies that link sculpting (the 2008-2009 SEO practice of using nofollow on internal links to direct equity flow) does not work the way it was originally claimed. Google modified the nofollow handling in 2009 so that nofollowed links still consume a share of the page's outbound equity, even though they do not pass it. The practical implication is that "sculpting" by nofollowing low-priority links does not concentrate more equity on the followed links; it just wastes equity on the nofollowed paths. The contemporary practice is to remove links you do not want to count, not to nofollow them.

Click Depth, Crawl Frequency, and the Long Tail

The third metric (after in-link distribution and anchor portfolio) is click depth, the minimum number of clicks from the homepage to reach a given page. Google's documented crawl behavior weights pages by their accessibility from the root; pages with low click depth (1 to 3 clicks from home) are crawled more frequently than pages with high click depth (5 or more clicks from home), and the crawl frequency interacts with the refresh frequency in the index.

The architecture decision is to keep the click depth low for the pages that matter. The standard hub-and-spoke pattern produces a click depth of 2 for every spoke (home > pillar > spoke), which is shallow and good. The layered hub topology produces a click depth of 3 to 4 for the deepest leaves, which is still reasonable. The properties that get into click-depth trouble are large catalogs with deep faceted-navigation structures, where the actual page can be 6 to 9 clicks from the root, and pagination on long lists, where page 12 of a category listing might be 7 clicks deep.

Crawl Frequency as a Function of Click Depth From Root (Across Advisory Partner E-Commerce and Publishing Sites)

The drop from depth 3 to depth 4 is the most operationally significant; pages at depth 4 or deeper are crawled less than half as often as pages at depth 3, and the gap widens with each additional level. For a long-tail content property where the deep pages are the ones capturing the long-tail traffic, this matters: a leaf at depth 6 may go weeks between recrawls, and any content update (a price change, a new section, a fresh statistic) takes proportionally longer to propagate to the index.

The remediation is to flatten the click depth for the long-tail pages that matter. The mechanisms are contextual mid-body links from high-equity pages (a leaf linked from a popular blog post inherits a shorter effective click depth), breadcrumb navigation (which Google's crawl uses for path discovery and reports in Search Console), and dedicated "see also" blocks on related pages. Each mechanism creates a shorter path from the root to the leaf, and the minimum path is what counts.

From Experience

Advisory work, mid-tier publisher with 4,400 indexed articles, 2023

The publisher had a clean hub-and-spoke architecture across roughly 80 topic pillars with 30 to 60 spokes each. The aggregate traffic was healthy but the long-tail growth had stalled at around 1.8 million monthly organic sessions for nearly two years. The internal-link diagnostic showed a clean hub-and-spoke shape but a median click depth of 4.7 for the leaf articles, with the deepest 12 percent of articles at depth 6 or worse. The remediation took a quarter: rebuild the architecture into three layers of hubs, add contextual cross-links between related leaves under different hubs, add a "topics like this" block to every article with three to five hand-curated lateral links. After two quarters the median click depth dropped to 3.1 and the deep tail dropped to depth 5. Aggregate traffic grew 38 percent over the following three quarters, with the lift concentrated on the long-tail keywords that the previously-deep articles targeted.

Measuring Internal PageRank-Equivalent Centrality

The internal-linking work benefits enormously from a measurement loop. The standard tools (Screaming Frog, Sitebulb, OnCrawl, DeepCrawl) compute an internal PageRank-equivalent metric (different tools call it different things: Link Score, Page Strength, Internal PageRank), which is the equilibrium-distribution score on the operator's own link graph. The metric is not Google's actual PageRank (Google does not expose that), but it is a useful proxy for the equity-flow shape of the operator's architecture.

The measurement workflow that has been operationally useful is monthly: crawl the site, export the internal PageRank-equivalent metric for every URL, compare against the previous month, flag the URLs where the metric has dropped significantly (suggesting links were lost) and the URLs where it has risen significantly (suggesting an architectural improvement is taking effect). The deltas surface architecture-level changes that no individual content or technical change would account for.

Internal PageRank-Equivalent Score Distribution Per Page Class (Across Advisory Partner Operators)

Page class	Median IPR-equivalent	P10	P90	Pages per site
Homepage	1.000 (baseline)	1.0	1.0	1
L1 hub pages	0.084	0.041	0.184	5 to 18
L2 hub pages	0.0234	0.011	0.054	24 to 84
High-traffic editorial leaves	0.0107	0.0044	0.027	180 to 1,400
Long-tail editorial leaves	0.0042	0.0018	0.0094	1,200 to 11,000
Glossary, definition, reference pages	0.0034	0.0011	0.0084	120 to 2,400
Tag-archive pages	0.0018	0.0007	0.0048	80 to 480
Author-archive pages	0.0014	0.0005	0.0038	20 to 240
Orphan pages (zero in-links)	Near zero	0	Near zero	Variable

The distributions are not stable across sites (the IPR baseline depends on the link graph structure, the total page count, and the crawl parameters), so the absolute numbers are less useful than the relative shape. A healthy site has IPR roughly tracking page importance: hubs above leaves, high-traffic leaves above long-tail leaves, leaves well above tag-archives. A site where the tag-archive pages have IPR comparable to or higher than the editorial leaves is leaking equity into a low-value layer.

The orphan-page count is the single highest-leverage diagnostic the IPR computation surfaces. An orphan page (zero inbound internal links) is a page that exists in the URL space (it may have inbound external links or be in the sitemap) but receives no signal from the internal architecture. Orphan pages are typically the result of editorial neglect (a piece was published, never linked from a hub, and slipped into the long tail), legacy templates (the old "tag" page that no longer exists in the navigation), or technical errors (a redirect rule that pointed every old URL to the homepage, leaving the actual content pages unreferenced). Each orphan is a candidate for either reconnection (link it from somewhere) or removal (delete or no-index it).

Anchor Text Concentration and the Penguin Echo

A specific pattern worth naming is the anchor-text-concentration risk that internal linking can produce. A site that auto-generates "related articles" widgets from tag overlap will, over time, accumulate hundreds of inbound internal links with identical anchor text (the title of the target article, used as the link text in every widget). The aggregate effect is an internal link profile where one anchor phrase dominates by a wide margin, which can trigger the same anti-manipulation signals that external anchor concentration triggers.

In partner data on automated-linking implementations, the share of internal in-links using the exact target-page title as anchor text ranges from 60 to 92 percent on the affected pages. This is far above the natural editorial range (where the same target gets linked with varied anchor text depending on the linking page's context) and produces ranking instability under updates. The remediation is to programmatically vary the anchor text on automated widgets, drawing from a pool of variations per target page (title, short title, alternative keyword, partial match, descriptive phrase), so the aggregate anchor distribution looks more like editorial linking.

The other pattern worth naming is internal-link velocity. A migration or restructuring that adds 50,000 internal links to a site in one cutover (the case when a new navigation system or a new related-articles widget rolls out) shows up as an internal-link velocity spike, and the spike can take Google weeks to recrawl and re-evaluate. During the re-evaluation window, the rankings on the affected pages frequently fluctuate as the link graph settles. The mitigation is to roll out architectural changes incrementally where possible, and to expect a settling period when a cutover is unavoidable.

The Cross-Cluster Bridge as a Specific Architectural Move

The single most impactful architectural move in the layered hub topology, in our advisory experience, is the cross-cluster bridge. The mechanism: a bridge is a contextual mid-body link from a leaf or hub in one cluster to a leaf or hub in a different cluster, where the two clusters are topically distinct but share readership or user-journey overlap. A "B2B SaaS pricing" leaf linking to a "B2B SaaS onboarding" leaf is a bridge; a "CTR curve" leaf linking to an "internal linking architecture" leaf is a bridge; a "cohort analysis" leaf linking to a "subscription tier design" leaf is a bridge.

The bridges do three things simultaneously. They pass equity from one cluster to the other, lifting the destination's IPR. They create lateral paths in the link graph that reduce average click depth across the site. They send a topical relevance signal that helps the destination page rank for queries at the intersection of the two clusters (the "pricing and onboarding" intersection queries, the "CTR and linking" intersection queries). The third effect is the most under-appreciated: bridges expand the addressable keyword space by making intersection queries rankable.

The discipline of bridge selection is editorial. A bridge that connects two unrelated clusters dilutes both; a bridge that connects two heavily-related clusters consolidates without adding new signal. The sweet spot is bridges that connect adjacent-but-distinct clusters where a meaningful share of readers of one cluster will also be interested in the other, and where the keyword overlap is non-trivial but the head-term overlap is small. The selection requires judgment about the reader and the keyword space, which is not amenable to automation.

In advisory work on content properties that scaled past 1,000 pages, the bridge density that produced the most consistent ranking lift was roughly 8 to 14 percent of mid-body links per article being cross-cluster (the rest being within-cluster). A property with 4 percent bridge density is under-bridged; a property with 30 percent bridge density is over-bridged and loses cluster coherence. The right density is in between, and the exact number depends on how natural the cross-cluster relationships are in the operator's specific topic space.

The mature internal-linking architecture is a directed graph, not a content taxonomy. The work is in shaping the equity flow toward the pages that need to rank, not in mirroring the editorial table of contents.

A Note on Faceted Navigation, Breadcrumbs, and the Pagination Tax

Two infrastructure-level features interact with the internal-linking architecture and are worth noting: faceted navigation on e-commerce sites, and breadcrumbs across all site types. Both are link-graph artifacts that the operator does not always think of as such.

Faceted navigation generates massive numbers of internal links (every filter, every sort, every combination), and the link graph implications can swamp the editorial architecture if not handled deliberately. The standard advice (no-index the filter URLs, canonicalize aggressively, manage which combinations are crawlable through robots.txt) is well-documented elsewhere. The link-graph implication is that an un-managed faceted nav consumes the bulk of internal PageRank flow and starves the canonical category pages of the equity they need. A properly-managed faceted nav uses rel="nofollow" or robots.txt blocks on the non-indexable combinations and reserves the equity for the canonical category and product pages.

Breadcrumbs are the other infrastructure-level feature with link-graph consequences. A well-implemented breadcrumb on every page creates a deterministic path from the root to the page, which (a) shortens the click depth for every leaf, (b) creates symmetric upward links from every leaf to its parent hub (consolidating equity at the hub), and (c) produces a clean structured-data signal that Google uses to display breadcrumb-style sitelinks in the SERP. The link-graph value of breadcrumbs is substantial enough that a site without them is leaving meaningful equity unrouted; adding them retroactively is a high-leverage architectural move on a maturing property.

The pagination tax is the third feature worth naming. A category page with 14 pages of pagination produces 14 indexable URLs, each with a different in-link profile (page 1 has more in-links than page 14), and the equity flow across the paginated series is not always what the operator wants. Google's deprecation of rel="next" / rel="prev" in 2019 left the operator with a smaller toolkit for managing paginated series; the contemporary practice is to use clean self-referencing canonicals on each page, to ensure the deep pages are reachable, and to consider whether the canonical category page should aggregate enough content to make the deep pages unnecessary (the "consolidate to one URL" decision versus the "let pagination breathe" decision).

The Architectural Audit in Practice

The audit that has produced the most useful prioritization across advisory engagements is structured as follows. First, crawl the site and compute the IPR-equivalent metric per URL. Second, produce the in-link distribution histogram and inspect for the shape patterns (healthy asymmetric, hub-and-spoke bimodal, flat automated, disconnected silos, legacy editorial). Third, compute the click depth per page and flag any high-value pages at depth 5 or worse. Fourth, sample the anchor-text portfolio for the top 50 pages by current traffic and flag any with exact-match anchor share above 45 percent. Fifth, enumerate the orphan pages and decide for each whether to reconnect or remove. Sixth, identify the candidate cross-cluster bridges (pairs of leaves under different hubs that the operator knows are reader-relevant to each other) and prioritize the highest-value bridges for implementation.

Each step produces a list of architectural moves with estimated impact. The moves get prioritized by leverage: re-routing equity from a tag-archive page to an editorial leaf is high-leverage; adding contextual cross-links across two adjacent clusters is high-leverage; tweaking the exact-match anchor text on a single page is low-leverage. The audit's job is to surface the moves; the operator's job is to commit to the prioritization. The discipline is in the prioritization, not in the audit itself.

The audit cadence that has held up is quarterly for properties under 1,000 pages, monthly for properties between 1,000 and 10,000 pages, and continuous (built into the publishing workflow) for properties above 10,000 pages. The largest properties cannot afford a periodic audit because too much changes between audits; they need the link-graph implications of every publish baked into the editorial process, which requires tooling rather than human review.

Key Takeaways

Hub-and-spoke is a starter pattern that works under roughly two hundred pages and stops producing marginal ranking lift between 180 and 320 pages. The pillar pages saturate, the spokes plateau, and the architecture has no slot for cross-cluster relationships.
The mature alternative is a layered hub topology with three or four levels of hubs, cross-cluster bridges at each level, and leaf-to-leaf lateral links that bypass the hub hierarchy. The structure is closer to a directed acyclic graph than to a star.
The PageRank framing (equity flows through links, weighted by the source's accumulated equity divided by its outbound link count) is the right mental model for architectural decisions. The Reasonable Surfer extension adds link position and click probability as weights, so a contextual mid-body link is worth more than a footer link.
The in-link distribution per page is the most useful diagnostic. Healthy sites have an asymmetric distribution consolidating at the hubs; unhealthy sites have flat automated distributions, bimodal hub-and-spoke distributions, or disconnected-silo distributions.
The anchor-text portfolio matters. The exact-match share on internal in-links that has produced stable rankings sits in the 18 to 32 percent range; shares above roughly 45 percent produce instability under updates; shares below 12 percent leave signal on the table.
Click depth interacts with crawl frequency. Pages at depth 4 or deeper are crawled less than half as often as pages at depth 3, and content updates on deep pages propagate to the index proportionally slower. The remediation is to flatten click depth for the long-tail pages that matter through contextual cross-links from high-equity sources.
Internal PageRank-equivalent measurement (Screaming Frog, Sitebulb, OnCrawl) provides a quantitative loop on the architecture. The monthly delta on per-URL IPR surfaces architectural changes that no individual content or technical change accounts for.
The cross-cluster bridge is the single most impactful architectural move on a maturing property. Bridges pass equity across clusters, shorten lateral click paths, and expand the rankable keyword space to intersection queries. The right density sits in the 8 to 14 percent range of mid-body links per article.

Citations and Further Reading

Sergey Brin and Lawrence Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine" (1998), the foundational paper on PageRank and the original conceptual basis for the equity-flow framing.
Google patent US 7,716,225, "Ranking documents based on user behavior and/or feature data" (Reasonable Surfer, 2010), the patent literature on click-probability weighting of link equity.
Bill Slawski's archived patent-analysis work at Go Fish Digital, the most comprehensive public reading of the Google patent literature on link weighting, link sculpting, and reasonable-surfer extensions.
Cyrus Shepard and the Moz internal-linking research, including the "Top 6 Ranking Factors" series and the experimental work on link types.
Kevin Indig's published work on topical authority and internal-link sculpting, covering both the practical operating decisions and the patent-literature background.
Google Search Central documentation on crawl budget, internal linking, and the deprecation of rel="next" / rel="prev" pagination markup.
HubSpot's original topic cluster framework documentation (2017-2019), the source of the popularized hub-and-spoke pattern.
The Screaming Frog, Sitebulb, OnCrawl, and DeepCrawl documentation on internal PageRank-equivalent computation, anchor-text auditing, and orphan-page detection.
The Penguin algorithm published documentation and the subsequent integration into the core algorithm, the canonical reference for the anti-manipulation gradient on anchor text.
John Mueller's commentary on internal linking, nofollow handling, and the link-equity flow on the Search Off the Record podcast and Search Console Help office hours.
Aleyda Solis's work on technical SEO and information architecture, particularly the operational frameworks for managing faceted navigation and crawl budget on large e-commerce sites.
The W3C HTML Living Standard and the Schema.org BreadcrumbList specification for the canonical references on breadcrumb implementation.