Technical SEO Checklist for Google and AI SEO in 2026

By: Daniel Houle

Founder & Creative Director at Azuro Digital

17 minute read

Your content strategy, your link building, your on-page optimization – none of it matters if the technical foundation underneath is broken.

Technical SEO is the infrastructure that determines whether search engines and AI platforms can actually find, crawl, render and understand your website. A single misconfigured robots.txt file can make your entire site invisible. A slow server response can cause Google to crawl fewer of your pages. A JavaScript rendering issue can mean that AI platforms see a blank page where you see a fully loaded product listing.

And the stakes have gotten higher. In 2026, your site isn’t just being visited by Googlebot and Bingbot. AI crawlers from OpenAI, Anthropic, Perplexity and others are now scanning your content to feed into their answer engines. If those crawlers can’t access or parse your pages, you don’t exist in AI search.

This checklist covers every technical element that matters in 2026 and it’s what our Canadian SEO Specialists here at Azuro Digital swear by – from the fundamentals that have always been important to the newer considerations specific to AI search. Work through it systematically and fix what’s broken:

1. Crawlability

If search engines can’t crawl your pages, nothing else matters. Crawlability is the absolute foundation of technical SEO.

Run a full site crawl. Use Screaming Frog, Ahrefs or Semrush to crawl your entire site. This will surface broken links (404 errors), redirect chains, orphaned pages (pages with no internal links pointing to them), server errors and crawl depth issues. Fix these first before moving on to anything else.

Check Google Search Console. The Pages report in Google Search Console shows you exactly which pages Google has indexed and which it hasn’t – and why. Look for pages marked “Crawled – currently not indexed” or “Discovered – currently not indexed.” These are pages Google found but decided not to add to its index, often because of technical issues or content-related issues.

Monitor your crawl budget. Google allocates a finite crawl budget to each site based on its size and authority. If a large portion of that budget is being wasted on low-value pages (paginated archives, filter pages, session ID URLs), your important pages may not get crawled frequently enough. Use the Crawl Stats report in Google Search Console to see how Google is spending its crawl budget on your site.

Ensure important pages are within three clicks of the homepage. Pages buried deep in your site structure receive less crawl priority and less link equity. If a critical service page requires six clicks to reach from the homepage, both search engines and users will struggle to find it.

2. Robots.txt

Your robots.txt file tells crawlers which parts of your site they can and can’t access. In 2026, it also serves as a governance document for AI crawlers.

Review your robots.txt for accidental blocks. It’s more common than you’d think – a leftover “Disallow: /” from a staging environment can block your entire site from being crawled. Check that your robots.txt isn’t accidentally blocking important pages, your CSS/JS files (which Google needs for rendering) or your media files.

Manage AI bot access deliberately. This is new territory for most site owners. There are now two categories of AI crawlers you need to think about: training bots (which scrape your content to train AI models) and retrieval bots (which fetch your content in real time to answer user queries). You may want to treat these differently.

For example, you might want to allow OAI-SearchBot (OpenAI’s real-time search crawler) access to your content so you appear in ChatGPT search results, while blocking GPTBot (OpenAI’s training crawler) if you don’t want your content used for model training.

The key AI bot user agents to know:

GPTBot – OpenAI’s training crawler
OAI-SearchBot – OpenAI’s real-time search crawler
ChatGPT-User – ChatGPT browsing on behalf of a user
ClaudeBot – Anthropic’s crawler
PerplexityBot – Perplexity’s crawler
Google-Extended – Google’s AI training crawler (separate from Googlebot)

Be intentional about which you allow and which you block. If you block all AI crawlers, you won’t appear in AI-generated answers. If you allow all of them, your content may be used for training without attribution.

Include your sitemap URL in robots.txt. This helps all crawlers – both traditional and AI – discover your content more efficiently.

3. LLMs.txt

LLMs.txt is a newer protocol – proposed in late 2024 – that serves as a curated guide for AI crawlers. Think of it as a table of contents for AI systems, pointing them to your most important content.

Unlike robots.txt (which controls access), LLMs.txt is written in Markdown and lists your key pages with brief descriptions. When an AI crawler lands on your site, instead of scanning hundreds of pages trying to figure out what’s important, it can reference your LLMs.txt file to quickly identify your highest-value content.

The honest truth: this is still an emerging standard. John Mueller from Google has noted that no AI crawlers have officially confirmed they extract information via LLMs.txt yet. Early adoption data is mixed – some sites report seeing AI crawlers access the file immediately, while others see no measurable impact.

That said, the implementation effort is minimal (it’s just a Markdown text file in your root directory), and platforms like Yoast and Webflow have already started building support for it. Claude has listed LLMs.txt in its official documentation. If you have the time to set it up, the downside risk is zero and the potential upside grows as AI search matures.

Include your most important pages: core service pages, pillar content, pricing pages and key resources. Don’t include low-value pages, outdated blog posts or anything you wouldn’t want AI to prioritize.

4. XML Sitemaps

Your XML sitemap is the roadmap you hand to search engines. It tells them which pages exist on your site, when they were last updated and how important they are relative to each other.

Include only indexable, canonical pages. Don’t include pages blocked by robots.txt, pages with noindex tags, redirected URLs or duplicate content. Your sitemap should be a clean list of the pages you actually want indexed.

Keep sitemaps under 50,000 URLs or 50MB. If your site is larger, split it into multiple sitemaps and reference them from a sitemap index file. Organize your sitemaps logically – one for blog posts, one for product pages, one for service pages – so you can monitor indexation by content type.

Update your sitemap automatically. Your sitemap should reflect your site’s current state. When you publish a new page, update content or remove a page, the sitemap should update accordingly. Most CMS platforms (WordPress with Yoast or Rank Math, Shopify, etc.) handle this automatically, but verify that it’s actually working.

Submit your sitemap to Google Search Console and Bing Webmaster Tools. Don’t just create the sitemap – make sure search engines know about it. Submit it through both platforms and monitor the indexation reports to see if pages from your sitemap are being picked up.

5. Site Architecture

A clean, logical site architecture helps both search engines and users navigate your content. It also establishes the topical authority signals that AI platforms use to evaluate whether your site is a credible source.

Use a flat hierarchy. Important pages should be reachable within 2-3 clicks from the homepage. Deep, complex navigation structures bury content where crawlers struggle to find it.

Create clear content clusters. Group related content together with a pillar page at the centre and supporting articles linking to and from it. This signals to both Google and AI platforms that your site covers a topic comprehensively. A site with an interlinked cluster of 20 articles about kitchen renovations sends a much stronger topical authority signal than a single standalone article.

Use descriptive, keyword-rich URLs. Keep URLs short, lowercase and hyphen-separated. Include the primary keyword. Avoid parameters, session IDs and unnecessary folder depth.

Implement breadcrumb navigation. Breadcrumbs help users understand where they are in your site hierarchy and provide internal linking benefits. They also help Google and AI systems understand your content’s position within your site structure. Use BreadcrumbList schema markup to make them machine-readable.

6. Canonical Tags and Duplicate Content

Duplicate content confuses search engines by splitting ranking signals across multiple URLs that contain the same content. Canonical tags tell search engines which version of a page is the “primary” one.

Set self-referencing canonical tags on every page. Even if a page doesn’t have duplicates, a self-referencing canonical tag prevents issues caused by URL parameters, tracking codes or session IDs creating unintended duplicate URLs.

Audit for duplicate content. Common sources of duplication include HTTP vs. HTTPS versions, www vs. non-www versions, trailing slash vs. non-trailing slash URLs, paginated content and filter/sort pages on ecommerce sites. Use your crawling tool to identify pages with identical or near-identical content and consolidate them with canonical tags or 301 redirects.

Make sure canonical tags point to indexable pages. A canonical tag pointing to a no-indexed page or a 404 creates a conflicting signal that confuses crawlers. Audit your canonicals regularly to catch these inconsistencies.

For AI search, duplicate content is particularly problematic because AI platforms may cite inconsistent or outdated versions of your content. Clean canonicalization ensures that when an AI system references your page, it’s pulling from the authoritative version.

7. HTTPS and Security

HTTPS is a confirmed Google ranking signal and a baseline trust requirement. If your site is still on HTTP, fix this immediately.

Ensure your entire site uses HTTPS. Every page, every image, every resource. Mixed content (HTTPS pages loading HTTP resources) triggers browser warnings and undermines trust signals.

Set up proper HTTP-to-HTTPS redirects. All HTTP URLs should 301 redirect to their HTTPS equivalents. Don’t leave both versions accessible.

Keep your SSL certificate current. An expired certificate displays a security warning that drives users away and signals to search engines that your site isn’t properly maintained.

Implement security headers. HTTP security headers like HSTS (HTTP Strict Transport Security), Content-Security-Policy and X-Content-Type-Options add layers of protection and signal to both search engines and AI platforms that your site takes security seriously. Trust is a critical component of E-E-A-T, and technical security contributes to that signal.

8. Core Web Vitals

Google uses Core Web Vitals as part of its page experience ranking signals. These three metrics measure real-world user experience:

Largest Contentful Paint (LCP) – how quickly the largest visible element loads. Target: under 2.5 seconds. Common fixes: optimize and compress images, implement lazy loading, reduce server response times, use a CDN and preload critical resources.

Interaction to Next Paint (INP) – how quickly the page responds when users interact with it (clicking buttons, typing in fields, etc.). Target: under 200 milliseconds. INP replaced First Input Delay as a Core Web Vital in 2024. Common fixes: reduce JavaScript execution time, break up long tasks, minimize main thread blocking and defer non-critical scripts.

Cumulative Layout Shift (CLS) – how much the page layout shifts unexpectedly during loading. Target: under 0.1. Common fixes: set explicit width and height on images and videos, use CSS containment and reserve space for ad slots and dynamic elements.

Monitor these metrics in Google Search Console’s Core Web Vitals report or in PageSpeed Insights. Fix “Poor” scores first (these are confirmed negative ranking factors), then work on moving “Needs Improvement” scores into the “Good” range.

For AI search, Core Web Vitals matter indirectly. Pages with poor performance rank lower in traditional search, and since AI platforms rely on indexed web content to generate answers, lower-ranking pages are less likely to be discovered and cited.

9. Mobile-First Indexing

Google primarily uses the mobile version of your site for indexing and ranking. If your mobile experience is deficient, your rankings will suffer across both mobile and desktop results.

Use responsive design. Your site should adapt fluidly to all screen sizes. Test on actual mobile devices, not just browser resizing tools.

Ensure content consistency between mobile and desktop. If content, structured data or metadata is present on your desktop version but missing from your mobile version, Google may not see it. This is a common issue with sites that use separate mobile templates or hide content on smaller screens.

Make tap targets accessible. Buttons and links should be large enough to tap easily on a touchscreen. Google recommends tap targets of at least 48×48 CSS pixels with adequate spacing between them.

Avoid intrusive popups. They sometimes result in a ranking penalty. Google’s guidelines specifically target popups that appear immediately on page load or that are difficult to dismiss.

10. JavaScript Rendering

This is one of the most commonly overlooked technical SEO issues, and it’s become even more critical with AI search.

Google renders JavaScript in a two-phase process: first it crawls the raw HTML, then it queues the page for rendering (executing the JavaScript). This rendering queue can introduce delays – sometimes days – before Google sees your fully rendered content.

AI crawlers are even less reliable at rendering JavaScript. If your product descriptions, pricing, FAQs or other critical content only appear after JavaScript executes, AI platforms may see a blank or incomplete page.

Test what crawlers actually see. Use Google Search Console’s URL Inspection tool to compare the raw HTML with the rendered version. If key content is missing from the raw HTML, crawlers may miss it.

Implement server-side rendering (SSR) or pre-rendering for critical content. This ensures that when any crawler – Google, Bing or an AI bot – visits your page, the important content is present in the initial HTML response without requiring JavaScript execution.

Don’t rely on JavaScript for navigation. Internal links should be standard <a href> HTML elements. JavaScript-based navigation (on-click handlers, single-page app routing) may not be followed by all crawlers.

Google’s December 2025 rendering update clarified that pages returning non-200 HTTP status codes may be excluded from the rendering pipeline entirely. Make sure your error pages return proper status codes and that your server isn’t accidentally serving 404 or 5xx errors on valid pages.

11. Schema Markup and Structured Data

Schema markup is a standardized vocabulary of code (typically in JSON-LD format) that you add to your pages to tell search engines and AI systems exactly what your content represents.

Optimizing your schema markup often has a strong positive impact on rankings, but in some cases it has very minimal impact or no impact at all. However, we still recommend doing it to have all your bases covered.

Here are some common schema types for SEO:

Article schema – for blog posts and news stories. Includes properties for author, date published, date modified and headline.

FAQ schema – for pages with question-and-answer content. This is one of the lowest-effort, highest-impact schema types you can add.

HowTo schema – for step-by-step instructional content. Helps AI systems extract specific steps from your guides and tutorials.

LocalBusiness schema – for businesses with physical locations. Helpful for local SEO and AI platforms answering location-based queries.

Person schema – for author bio pages. Linking author credentials to your content through Person schema strengthens your E-E-A-T signals.

Product schema – for ecommerce pages. ChatGPT has confirmed it uses structured data to determine which products appear in its results.

Make sure your schema markup matches what’s actually visible on the page. AI systems check for consistency – if your Article schema says “Published: January 2026” but your page shows a different date, that’s a red flag. Validate your markup regularly using Google’s Rich Results Test.

12. Redirect Management

Redirects are a fundamental part of maintaining a healthy site, but they can cause serious problems if mismanaged.

Use 301 redirects for permanent moves. When a page permanently moves to a new URL, use a 301 redirect to transfer the link equity from the old URL to the new one. Avoid 302 (temporary) redirects for permanent changes – they don’t pass full link equity.

Eliminate redirect chains. A redirect chain happens when URL A redirects to URL B, which redirects to URL C. Each hop in the chain loses a small amount of link equity and adds latency. Audit your redirects and ensure every redirect points directly to the final destination.

Fix redirect loops. A redirect loop happens when URL A redirects to URL B, which redirects back to URL A. This creates an infinite loop that makes the page completely inaccessible to both users and crawlers.

Monitor redirects after site migrations. Site migrations and redesigns are the most common source of broken redirects. Map old URLs to new ones carefully and test every redirect after launch.

13. Log File Analysis

Server log files show you exactly how search engines and AI crawlers are interacting with your site – what they’re crawling, how often, which pages they’re hitting and which they’re ignoring.

Identify crawl waste. If Googlebot is spending a disproportionate amount of time crawling low-value pages (old tag archives, filtered URLs, internal search results), that’s crawl budget being taken away from your important pages.

Monitor AI bot activity. Check your logs for GPTBot, ClaudeBot, PerplexityBot and other AI crawlers. See which pages they’re accessing and how frequently. This tells you which of your pages are being fed into AI models and which are being ignored.

Spot crawl anomalies. Sudden drops in crawl frequency can indicate server issues, robots.txt problems or a Google penalty. Sudden spikes in crawling from unknown bots could be scraping activity.

Tools like Screaming Frog’s Log File Analyzer and Semrush’s Log File Analyzer can process log files at scale. For smaller sites, you can analyze log files manually with spreadsheet tools.

14. International SEO (Hreflang)

If your site serves content in multiple languages or targets multiple countries, hreflang tags tell search engines which version to show to which audience.

Implement hreflang correctly. Hreflang tags must be reciprocal – if Page A points to Page B as the French version, Page B must point back to Page A as the English version. Non-reciprocal hreflang tags are ignored.

Include a self-referencing hreflang tag. Every page should include a hreflang tag pointing to itself in addition to tags pointing to its alternate-language versions.

Use the correct language and region codes. Use ISO 639-1 language codes (ie. “en” for English, “fr” for French) and optionally ISO 3166-1 Alpha 2 country codes (ie. “en-CA” for Canadian English, “fr-CA” for Canadian French).

For AI search, proper hreflang implementation helps ensure that AI platforms cite the correct language version of your content when generating answers for users in different regions.

15. Ongoing Audit Schedule

Technical SEO isn’t a one-time project. Your site changes, Google’s algorithms update, new AI crawlers emerge and things break. Build a regular audit cadence into your workflow.

Monthly: Check Google Search Console for new crawl errors, review Core Web Vitals scores, monitor for 404 errors and broken links, review indexation status for recently published pages.

Quarterly: Run a full site crawl with Screaming Frog or equivalent. Audit redirect chains and loops. Review robots.txt and sitemap accuracy. Double check structured data. Analyze server logs for crawl budget issues and AI bot activity.

After every major site change: Any time you launch a redesign, migrate platforms, restructure URLs or make significant changes to navigation or templates, run a full technical audit immediately. Site migrations are the single most common cause of technical SEO catastrophes.

Final Thoughts

Technical SEO in 2026 is about engineering a transparent relationship with Google’s traditional crawlers and the AI agents that are reshaping how people find information.

The fundamentals haven’t changed: make your site fast, crawlable, secure and well-structured. What’s changed is the audience. Your site is now being read by machines that don’t just index pages but synthesize answers from them. If Perplexity or ChatGPT can’t parse your product attributes without rendering heavy JavaScript, you miss the opportunity to provide the answer. If your structured data contradicts your visible content, AI platforms lose trust in your data. If your robots.txt blocks AI crawlers, you simply don’t exist in AI search.

Work through this checklist methodically. Fix the critical issues first – crawlability, indexing, security – then move to performance optimization, structured data and AI-specific considerations. The sites that get the invisible technical foundation right are the ones that get to compete on content, links and brand. Everyone else is building on a cracked foundation.

But technical SEO is only one part of the SEO equation – don’t forget about on-page SEO and link building strategies!

About the Author

Daniel Houle