โ€ข 7 min read

XML Sitemaps: A Complete Guide (with Examples)

xml sitemap indexing technical seo

An XML sitemap is a file that tells search engines which URLs on your site are important enough to crawl. It doesn't guarantee indexing, but it's the cleanest signal you can give Google about what your site contains and how often it changes.

What goes in (and what doesn't)

Include only URLs that are:

  • Live: return HTTP 200.
  • Indexable: no noindex meta or header.
  • Canonical: the URL itself, not a redirect target.
  • Self-canonical: its rel="canonical" points to itself.

Exclude:

  • Redirected URLs (3xx).
  • Error pages (4xx / 5xx).
  • Pages with noindex.
  • Duplicates that canonical to another URL.
  • Faceted-search URLs you don't want crawled (use robots.txt for these).

The minimal valid sitemap

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-04-15</lastmod>
  </url>
  <url>
    <loc>https://example.com/blog/seo-audit</loc>
    <lastmod>2026-04-12</lastmod>
  </url>
</urlset>

That's it. <changefreq> and <priority> are ignored by Google, don't bother with them. <lastmod> is the only optional field that still influences crawling, but only if it's accurate. Lying about lastmod (e.g. setting today's date on every URL) makes Google ignore the field entirely.

Limits

  • 50,000 URLs per sitemap file.
  • 50 MB uncompressed per file.
  • If you exceed either, split into multiple sitemaps and reference them from a sitemap index.

Sitemap index for large sites

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
    <lastmod>2026-04-17</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-blog.xml</loc>
    <lastmod>2026-04-17</lastmod>
  </sitemap>
</sitemapindex>

Group sitemaps by content type (pages, blog, products, categories). When you push new content, only the affected sitemap's lastmod updates - Google checks the index, sees one changed sitemap, and prioritizes that.

Where to put it & how to submit

  1. Host at the root: https://example.com/sitemap.xml.
  2. Reference it in robots.txt: Sitemap: https://example.com/sitemap.xml.
  3. Submit in Google Search Console: Sitemaps โ†’ enter the URL โ†’ Submit. Bing has a similar tool in Bing Webmaster Tools.

Specialized sitemaps

The standard urlset covers most needs. For specific content types Google supports extensions:

  • Image sitemap (image:image), useful only if image search drives meaningful traffic.
  • Video sitemap (video:video), required for rich video results.
  • News sitemap (news:news), only for sites accepted into Google News.
  • hreflang in sitemaps (xhtml:link), cleaner than per-page hreflang for sites with many languages.

Common mistakes

  • Including noindex URLs, wastes Google's time and signals inconsistency.
  • Including 301-redirected URLs, same problem.
  • Stale lastmod dates, either accurate or omitted.
  • Sitemap blocked in robots.txt. Yes, this happens.
  • HTML "sitemaps" (a page of links) submitted instead of XML. Build both; submit the XML.

How AuditAI helps

Every audit checks for sitemap presence, validity, and consistency between sitemap URLs and indexable pages. Run an audit free โ†’

Ready to audit your site?

Run an AI-powered SEO audit in under 30 seconds. Free, no signup required.

Run a free audit โ†’