Site Migration Risk Modeling: What the Pre-Launch Audit Misses

TL;DR: The majority of catastrophic site migrations fail not on the items that appear on every public checklist (URL mapping, redirect plan, sitemap submission) but on the items that tend to be invisible until traffic has already dropped: surviving canonical tags pointing at the old domain, schema markup silently lost in the new templates, internal link graphs reset to a much shallower distribution, redirect chains longer than four hops, sitemap-vs-index discrepancies that hide indexation gaps, and hreflang relationships broken across the cutover. A risk-modeling posture treats each of these as a probability-weighted failure mode, not a binary tick-box.

A note on the named companies and sources. Google's documentation, Aleyda Solis's published migration framework, the Distilled and Builtvisible migration write-ups, and the Search Console transition workflow appear in this essay as the available public reference points. Quantitative claims framed as advisory-engagement observation come from anonymized partner operators across e-commerce, publishing, and SaaS verticals, not from the named tooling vendors or agencies.

Why Migrations Fail in Practice

Every published migration framework agrees on the headline checklist. Map the URLs, configure the redirects, refresh the sitemaps, monitor Search Console, hold a war room for the first 72 hours. The published frameworks from Aleyda Solis, the Builtvisible engineering team, and Google's own developer documentation cover this layer thoroughly. The vast majority of migrations that fail catastrophically do not fail on this layer.

In advisory work across roughly thirty migrations between 2021 and 2024 (a mix of CMS swaps, domain consolidations, URL-pattern rewrites, and HTTPS cutovers), the pattern is consistent: the checklist items get done, the war room is staffed, and the traffic still drops twenty to forty percent in the first four weeks. The investigation that follows almost always uncovers a small number of secondary failure modes that nobody flagged because they sit between the canonical responsibilities of the development team, the SEO team, and the platform vendor. The canonical tag survives because the new template imports it from a legacy config. The schema markup disappears because the new design system is component-based and the old <script type="application/ld+json"> snippets were not migrated. The internal link graph resets because the old hand-curated cross-link blocks were not part of the content model.

These failure modes are tractable, but they require a probability-weighted view of risk rather than the binary checklist view. The rest of this essay reframes migration planning as a risk-modeling exercise, with explicit attention to the second-tier failure modes that the standard playbook tends to miss.

Eight Migration Archetypes and Their Risk Profiles

Not all migrations carry the same risk. The category of change determines which failure modes dominate. Treating "migration" as a single class of project is a category error that leads teams to either over-prepare for low-risk changes or under-prepare for high-risk ones.

The eight common archetypes split along three independent axes: whether the canonical hostname changes (yes/no), whether URL paths change (yes/no), and whether the underlying CMS or rendering layer changes (yes/no). The interaction of these axes produces a risk grid with very different expected drawdowns.

Eight Migration Archetypes by Axis, with Typical Risk Profile

Archetype	Host change	Path change	CMS change	Typical drawdown range	Dominant failure modes
HTTPS-only cutover	No (protocol)	No	No	0 to 5%	Mixed-content blocking, HSTS misconfiguration
URL pattern rewrite	No	Yes	No	10 to 25%	Redirect chain length, internal links not updated, canonical lag
CMS swap, paths preserved	No	No	Yes	10 to 20%	Schema markup loss, template-level canonical errors, internal link graph reset
CMS swap with path change	No	Yes	Yes	20 to 40%	All of the above plus redirect-map gaps
Domain consolidation	Yes	Variable	Variable	15 to 35%	Canonical persistence, sitemap-vs-index gap, brand-search dilution
Subdomain to subfolder (or vice versa)	Yes (effectively)	Yes	Variable	10 to 30%	Indexation lag, link equity propagation delay
Internationalization rollout	Variable	Yes	Variable	Variable	Hreflang breakage, geo-targeting misconfiguration, duplicate content
Static-to-dynamic or SPA migration	No	Variable	Yes	10 to 40%	Render-vs-crawl gap, lazy loading, hydration failures, schema loss

The drawdown ranges are not predictions; they are advisory-engagement observation of central tendencies across the migrations we have reviewed. The variance within each archetype is large, and a well-executed CMS swap with path change can produce drawdowns in the low single digits, while a poorly executed HTTPS cutover (the lowest-risk archetype on paper) has produced traffic losses in the high teens when HSTS was misconfigured and the canonical chain broke.

The point of the archetype frame is operational: it forces the team to ask "which kind of migration is this?" before reaching for a checklist, and to weight the failure modes accordingly.

The Probability-Weighted Failure Mode View

The risk-modeling alternative to a checklist treats each known failure mode as a probability of occurrence multiplied by a magnitude of impact. The total expected loss is the sum across failure modes. The pre-launch audit's job is to drive the probability of each high-magnitude failure mode toward zero.

For a CMS swap with path change (the archetype with the highest typical drawdown), the failure-mode decomposition looks roughly as follows in advisory work.

Contribution to Drawdown by Failure Mode, Across Advisory Partner CMS-Swap Migrations with Path Change

These contribution weights are not stable across sites. A multi-region publisher will have a much larger hreflang exposure; a thin-content SaaS site will have very little schema-markup downside because there was little structured data to begin with. The exercise is to estimate the contributions for the specific site under migration, then prioritize the audit accordingly.

The standard playbook over-invests in the first item (redirect map gaps) and under-invests in items two through six. Items one and seven get the most checklist attention; items two through six are the ones that produce the silent twenty-percent drawdowns nobody can explain in the post-mortem.

The Canonical Tag That Survives the Migration

The single most common silent failure we have audited is canonical tag persistence. The pattern: the migration team writes new redirects, updates the sitemap, and announces the launch. The new templates, however, were derived from the legacy templates and inherit a hardcoded canonical URL function that points at the old hostname (or, on a CMS swap, at the old URL pattern). The redirects fire correctly when a user or crawler requests the old URL, but the new URL serves a <link rel="canonical"> pointing back at the old URL, which 301s to the new URL, which canonicals back. The crawler sees the conflict and does not consolidate.

Google's documentation is unambiguous about how canonical conflicts resolve in practice: when a 301 redirect points at URL A and the destination's canonical tag points at URL B, Google treats the canonical as a strong hint, and the canonical-target URL is the one that gets indexed. If the canonical persists pointing at the old URL after the migration, the result is that the migrated URLs do not get indexed, and the old URLs remain in the index for as long as Google chooses to honor the canonical.

The audit pattern: before launch, crawl the staging environment with the new templates pointed at production-equivalent URLs, and verify that every URL emits a canonical pointing at itself (or, intentionally, at a designated canonical target on the new site). The check is mechanically simple; what makes it commonly missed is that it falls between the SEO team (who do not own the template code) and the engineering team (who do not own the canonical strategy).

Schema Markup Loss as Silent Drawdown

The second commonly missed failure mode is schema markup loss. Legacy CMSs typically embed JSON-LD or microdata at the template level: Product schema on PDPs, Article schema on editorial pages, Organization and SiteNavigationElement schema on the masthead, FAQPage schema on certain landing pages. When a CMS swap happens, the structured data is often handled as a content concern (move the title, the body, the meta description) rather than a template concern (move the schema blocks). The new design system, frequently component-based and built without a structured-data abstraction, simply does not emit them.

The traffic consequence is subtle. The pages still rank for their primary terms; the page-one positions hold. What disappears is the rich-result eligibility (review stars, FAQ accordions, sitelink searchbox, organization knowledge panel features) that drove click-through-rate on a meaningful subset of queries. Google Search Console's Enhancements reports flag the loss within a week, but the click-through-rate effect cumulates over months and is often misattributed to "seasonality" or "algorithm update" in the absence of a deliberate baseline.

In partner data, the click-through-rate drawdown from total Product schema loss on an e-commerce category ranges from 8 to 22% on queries that previously triggered rich results. The blended impact on total organic traffic depends on the share of traffic that came through schema-eligible queries (often 30 to 50% on large e-commerce sites, much less on editorial or SaaS sites).

Indexed Traffic After Migration, With and Without Schema Loss (Observed Across Advisory Partner Migrations, Indexed to 100)

The curve shape is the diagnostic. A migration with all the other ducks in a row recovers to baseline within eight to twelve weeks; a migration that lost schema markup recovers the indexation but not the click-through, and the new baseline settles 15 to 20% below the pre-migration level. The shortfall does not look like a migration failure on most dashboards because the rankings are intact; it looks like a long-term decay that nobody can explain.

The audit pattern: maintain a schema inventory of the legacy site before migration (Schema.org type, page template it appears on, frequency of occurrence). Verify after launch that the new templates emit the equivalent schema, validated through the Rich Results Test for a sample of URLs per template. The discipline is in keeping the inventory, not in any individual validation.

The Internal Link Graph Reset

The third failure mode is more subtle and more structural. Most established sites have a hand-curated internal-linking layer built over years: related-articles blocks at the bottom of editorial pages, cross-sell modules on PDPs, hub-page topic clusters, contextual mid-article links, footer concept links. This layer is often built into the content layer of the legacy CMS rather than the template layer, which means it does not migrate with the templates.

In a CMS swap, the migration team typically focuses on getting the content moved and the templates rebuilt. The internal-linking layer, if it was not modeled explicitly in the content schema, has to be rebuilt. In practice it gets rebuilt as a generic "related articles" widget powered by tag matching or a generic "more from this category" block, and the dense, hand-curated cross-link graph is replaced with a much shallower automated one.

A useful distinction here is between curated and computed internal linking. Curated linking is editorial: an author hand-picks the cross-links inside a piece of content and embeds them inline. Computed linking is algorithmic: a service generates related-link blocks from tag overlap, embedding similarity, or co-visitation. Established sites typically run both layers, with curated linking dominating in the contextual mid-article position (where it has the most ranking value) and computed linking handling the footer and sidebar slots. A migration that preserves only the computed layer strips out exactly the high-value links and keeps only the lower-value ones.

The link-graph consequence is direct. Internal links transmit PageRank-equivalent signal; the link distance from the root to any given page (the click-depth) affects crawl frequency and ranking probability. A site that had an average click depth of 3 to 4 from root to leaf, with significant cross-page interlinking, becomes a site with average click depth 4 to 5 and almost no cross-page interlinking. The crawl frequency drops; the long-tail visibility erodes.

Dense interlinking before migration (advisory illustration)

Loading diagram...

Shallow interlinking after migration (advisory illustration)

Loading diagram...

The audit pattern: before migration, export the internal link graph from a crawl of the legacy site (Screaming Frog, Sitebulb, or the equivalent). Measure the in-link distribution per URL, the average click depth, and the cross-cluster link density. After migration, recrawl and compare. If the distributions have flattened, the link-graph layer was lost. The remediation is not a checklist item; it is a content-engineering project to rebuild the curated layer in the new content model.

From Experience

Advisory work, large editorial publisher CMS swap, 2023

The publisher migrated from a custom CMS to a headless setup over six months. The redirect map was clean, schema was preserved, canonicals were correct. Traffic dropped 22% in the first four weeks and stayed down. The pattern we found in the crawl-graph comparison was unambiguous: average click depth went from 3.6 to 5.1, and the proportion of pages with at least 10 inbound internal links dropped from 38% to 11%. The legacy hand-curated "see also" blocks had been thousands of editorially curated links per month, and they had not been modeled in the new content schema. The remediation took a quarter (build the related-content service, backfill the relationships) and traffic recovered over the following two quarters. Nothing in the standard migration checklist would have surfaced the issue.

Redirect Chain Length and the Crawl-Budget Tax

The fourth commonly missed failure mode is redirect chain length. The basic principle of redirect mapping is that every URL in the legacy index should map to a single destination URL in the new index, and the 301 should fire in a single hop. In practice, large-site migrations frequently produce two-hop and three-hop redirect chains because two or more redirect rules are layered: the legacy URL pattern redirects to an intermediate URL pattern (often a normalization rule that was already in place before the migration), which then redirects to the new pattern.

Google's documented behavior is to follow redirect chains up to a limited number of hops (the documented and observed cap is around five), but link equity attenuates with each hop, and crawl budget is consumed per hop. A migration that produces a chain of three or four hops on a meaningful share of URLs can produce a sustained crawl-budget tax that delays the recrawl of the new URLs by weeks or months on a large site.

The audit pattern: after the redirect rules are deployed in staging, crawl the legacy URL list and record the hop count to terminal destination. Anything over two hops is a candidate for collapse: rewrite the redirect rules so that the legacy URL maps directly to the final destination in one hop, bypassing the intermediate. The collapse work is mechanical but tedious, and it commonly gets deprioritized because the chain "still works" from a user-facing perspective.

On a 50,000-URL e-commerce migration we audited, collapsing a two-hop pattern into one hop on roughly 18,000 URLs reduced the median time-to-recrawl of the new URLs from 28 days to 11 days. The improvement showed up in indexation curves within two weeks of the collapse.

Sitemap, Index, and the Gap Between Them

The fifth failure mode is the sitemap-versus-index gap. The standard playbook says to refresh the sitemap, submit it to Search Console, and monitor the indexation report. The gap that opens up under this routine is between what the sitemap claims (the canonical URL list) and what Google has actually indexed (a subset of the sitemap, plus some legacy URLs the redirects did not catch, minus some new URLs that hit an indexing issue).

The Search Console Indexation Coverage report is the source of truth for this gap, and the under-counted failure mode is that the report is consulted only in aggregate ("X URLs indexed, Y not") rather than in the breakdown that exposes the operating issues. The "Discovered, not currently indexed" bucket is the leading indicator for crawl-budget pressure on new URLs. The "Crawled, not currently indexed" bucket is the leading indicator for content-quality or thin-content issues that the migration exposed (often because the new templates reduced unique content per page). The "Page with redirect" bucket reveals legacy URLs that the redirect map missed or that an intermediate URL is still serving.

Search Console Indexation Buckets and Migration-Diagnostic Reading

Bucket	What it suggests	Typical migration cause	Remediation
Submitted and indexed	Healthy	Working as intended	None
Discovered, not currently indexed	Crawl-budget pressure	Internal link graph reset, low priority for new URLs, redirect chain tax	Improve internal linking, reduce chain length, prune low-value URLs
Crawled, not currently indexed	Quality threshold not cleared	Thin templates, duplicate templates, weak content on new layout	Increase unique content per template, consolidate near-duplicate URLs
Page with redirect	Legacy URL still in submitted set	Sitemap includes legacy URLs by mistake, or intermediate URL serving a 200	Clean sitemap to new URLs only; fix intermediate URL
Duplicate, Google chose different canonical	Canonical signal disagrees with sitemap	Canonical tag persistence, hreflang misconfiguration, parameter handling	Align canonical tags with sitemap-declared URLs
Soft 404	New template returning thin or empty content for some URL patterns	Template error or missing content for migrated URLs	Investigate and fix template; serve real 404 or 410 if intended

The audit pattern: weekly Search Console indexation review for the first eight weeks post-migration, broken down by bucket, with a trend line per bucket. A migration that is recovering shows "Submitted and indexed" rising and the diagnostic buckets falling. A migration that is silently failing shows one or more diagnostic buckets stuck or growing.

A subtler version of the gap shows up when the sitemap is generated dynamically from the production database. If the dynamic generator includes draft URLs, parameterized URLs, or filter-state URLs that the canonical strategy excludes from indexation, the sitemap claim and the canonical claim disagree, and Google treats the disagreement as a quality signal against the entire submitted set. The remediation is to generate the sitemap from the same canonical-URL function the templates use, not from an independent query against the database. The two generators must agree by construction, not by audit.

The other operating discipline that surfaces silent failures here is splitting the sitemap into multiple files by template type (one for PDPs, one for category pages, one for editorial content, one for glossary or terms). The split allows the Search Console indexation report to be read per template, which makes the diagnostic bucket reading dramatically more legible. A migration where the indexation rate on PDPs is 95% but on glossary terms is 30% is a different problem from a migration where indexation is uniformly 75% across templates, and the consolidated sitemap hides the difference.

Hreflang and the International Migration Trap

The sixth failure mode is hreflang breakage on international sites, and it is the single most under-counted item in migration audits for any site with more than two locales. Hreflang is fragile by design: every locale of a piece of content must reference every other locale by canonical URL, and any asymmetry (locale A says it has a counterpart B, but locale B does not return the reciprocal) causes the relationship to be ignored.

On a domain consolidation or URL-pattern migration, the hreflang URLs change for every locale at once. If the rebuild of the hreflang relationships is not symmetric and complete, the entire hreflang graph collapses and the geo-targeting that was driving locale-correct serving in Google's index breaks. The user-visible failure is that a French user starts seeing the English page in search results, or a UK user starts seeing the US-currency page.

The audit pattern: before migration, export the full hreflang relationship matrix from a crawl of the legacy site. After migration, recrawl and verify that the matrix is reconstituted with the new URLs and that every relationship is reciprocal. Tools that visualize the hreflang graph (Sitebulb, the dedicated hreflang validators in OnCrawl and DeepCrawl) make the audit tractable. Done manually, it is impractical above three locales.

Render Versus Crawl on JavaScript-Heavy Migrations

The seventh failure mode applies specifically to migrations that move from server-rendered to client-rendered architectures (or vice versa, in the case of teams moving from a JavaScript SPA back to a server-rendered or static-generated approach). The risk is that the version of the page Googlebot crawls is materially different from the version a user sees.

Google's documented behavior is to render JavaScript pages in two passes: a first crawl that picks up the initial HTML, and a second pass days later that renders the JavaScript and updates the index. This works well when the initial HTML is a meaningful approximation of the rendered page; it works badly when the initial HTML is largely empty (a <div id="root"></div> and a script tag) and the content arrives entirely via client-side rendering.

The migration risk is twofold. First, the new architecture may serve significantly less content in the initial HTML than the legacy architecture did, which produces a render-vs-crawl gap that delays indexation. Second, the client-side rendering may rely on hydration patterns (lazy-loaded sections, intersection-observer-triggered content, interactive components that render only after user interaction) that Googlebot does not trigger and therefore does not see.

The audit pattern: for every important template, compare the initial HTML response (the curl-equivalent body) with the rendered DOM (the Chrome DevTools snapshot after JavaScript execution). Anything important to ranking should appear in the initial response. Anything that only appears after JavaScript execution is subject to the rendering queue and is at risk of being missed or delayed. The Search Console URL Inspection tool's "Tested page" view is the canonical source for what Googlebot sees after rendering.

The probability-weighted audit catches what the binary checklist misses. The checklist asks "did we do X?"; the audit asks "what is the probability that we did X imperfectly, and what is the impact if we did?"

The Pre-Launch Risk Model in Practice

Putting the failure modes together produces an operating template for the pre-launch risk audit. The discipline is to enumerate the failure modes, estimate the probability and impact for the specific migration, and prioritize the audit work accordingly. The list below is the ordered audit we have used in advisory work on CMS swaps with path change, the highest-risk archetype.

Verify the URL map is complete and 1-to-1 by crawling the legacy site and checking that every URL has a destination in the redirect rules.
Verify canonical tag emission on the new templates with a staging crawl that confirms every URL emits a self-referencing canonical to its new-domain URL.
Inventory the schema markup on the legacy site by template; verify schema emission on the new templates and validate via the Rich Results Test for a sample.
Export the legacy internal link graph (Screaming Frog or equivalent); after migration, recrawl and compare the link distribution and average click depth.
Audit redirect chain lengths; collapse any chain of three or more hops to a single hop.
Clean the new sitemap to contain only the canonical new URLs; submit and monitor Search Console indexation buckets weekly.
Reconstitute hreflang relationships and validate symmetry across all locales.
Compare initial HTML versus rendered DOM for every important template; verify the initial response contains ranking-critical content.

Each step has a measurable artifact (a crawl diff, a Search Console screenshot, a Rich Results Test pass) and a defined acceptance criterion. The audit is not done until every artifact is filed and every criterion is met.

A Note on Post-Launch Monitoring

The risk-modeling posture does not stop at launch. The first 14 days after launch are the highest-information period of the migration, and the monitoring stack should be calibrated accordingly. The basics: daily Search Console indexation review, daily clicks/impressions trend per template type, weekly crawl-stats review for spider behavior, and an alerting threshold on traffic deltas per template that triggers investigation.

The non-basics: a daily rank-tracking sample for the top 200 commercial keywords, a daily backlink-graph snapshot to confirm the migration did not break inbound link consolidation, and a server-log analysis pass at week one and week four to compare Googlebot crawl distribution before and after. The server logs are the canonical source for what Googlebot is actually doing, and they expose patterns (crawl frequency by template, hop counts, response codes by directory) that no front-end tool can show.

The single most common late-stage failure is that the team declares the migration "done" at week two when the headline traffic numbers look normal, and the silent drawdown patterns (the long-tail erosion from internal-link-graph reset, the click-through erosion from schema loss) only become visible at week six or eight. By then, the war room has dispersed, the engineering team has moved to the next project, and the diagnostic context has been lost. The discipline of running the full audit at weeks two, four, and eight is what catches the silent failures before they harden into a new baseline.

The monitoring stack we have found most useful in advisory work has four layers. Layer one is the headline traffic dashboard, broken down by template and country, refreshed daily, with a deviation alert per template against a pre-migration baseline. Layer two is the Search Console daily export, pivoted by indexation bucket and by template, so that "Crawled, not currently indexed" can be tracked separately for PDPs and editorial. Layer three is the server-log analysis, weekly, comparing Googlebot crawl distribution and response-code distribution before and after. Layer four is the rank-tracking sample on the top commercial keywords, daily, with a delta against the pre-migration position. The four layers are independent and triangulate against each other: a drop in headline traffic that is not reflected in the Search Console buckets is probably an analytics issue, not a migration issue, and the diagnosis depends on having the lower layers available.

One additional discipline worth naming: the post-migration retrospective should be written at week eight, not week one, and should be written against the artifacts the pre-launch audit produced. The retrospective compares the pre-launch claim ("we verified canonicals, here is the artifact") against the post-launch reality ("the Search Console Duplicate bucket grew, here is the count") and identifies the audits that were technically passed but operationally insufficient. The retrospective is the input to the next migration, and the discipline of writing it against artifacts rather than memory is what makes the playbook cumulative across projects rather than starting from zero every time.

Key Takeaways

The headline migration checklist (URL mapping, redirects, sitemaps, war room) is necessary but not sufficient. The majority of catastrophic migrations fail on secondary failure modes that the checklist does not enumerate.
The eight common migration archetypes carry materially different risk profiles. Treating migration as a single class of project leads to over-preparation for low-risk changes and under-preparation for high-risk ones.
The under-counted failure modes are canonical persistence, schema markup loss, internal link graph reset, redirect chain length, sitemap-vs-index gap, hreflang breakage, and render-vs-crawl gap on JavaScript-heavy migrations.
A probability-weighted risk model treats each failure mode as a probability times a magnitude, sums to expected drawdown, and prioritizes the audit accordingly. The standard playbook over-invests in redirects and under-invests in the next six items in the contribution stack.
The pre-launch audit produces artifacts (crawl diffs, screenshots, Rich Results Test passes) that are signed off before launch. Artifact orientation is the discipline that separates migrations that succeed from migrations that fail.
Post-launch monitoring must extend to week eight, not week two. The silent failure modes (long-tail erosion, click-through erosion) take six to eight weeks to surface in the data, and the diagnostic context is lost if the team stands down at week two.

Citations and Further Reading

Google Search Central, "Site move with URL changes" and "Consolidate duplicate URLs" documentation, the canonical sources for the redirect, canonical, and indexation mechanics referenced throughout.
Google Search Central, "How to handle redirects (301, 302, etc.)" for the documented behavior on redirect chains and hop counts.
Aleyda Solis, "The SEO Migration Process Guide" (Orainti, multiple editions), a widely used framework for migration planning and execution.
Builtvisible and Distilled migration case studies, published over 2014 to 2022, including the post-mortems on canonical persistence and schema-markup loss patterns.
John Mueller, Search Off the Record podcast, on indexation buckets and the interpretation of the Search Console Coverage report.
Bartosz Goralewicz and the Onely team, published work on rendering, hydration, and the render-vs-crawl gap on JavaScript-heavy sites.
The Search Quality Rater Guidelines (Google, updated periodically), Section 3 on Page Quality and Section 4 on Needs Met, for the editorial framework that drives template-level quality assessment.
Screaming Frog, Sitebulb, and DeepCrawl documentation on internal-link-graph analysis, hreflang validation, and redirect-chain auditing.
The W3C HTML Living Standard for the canonical specification on <link rel="canonical"> and related directives.
The Web Almanac annual reports (HTTP Archive) for cross-site benchmarks on render performance, schema adoption, and redirect-chain prevalence.
"Google's reasonable surfer model" patent literature and the academic graph-theoretic work on link weighting, for the conceptual background to the canonical-and-redirect interaction.
The IndexNow protocol documentation (Microsoft, Yandex) for the alternative path on rapid recrawl during migration windows.