An XML sitemap is a file that tells search engines which URLs on your site are important enough to crawl. It doesn't guarantee indexing, but it's the cleanest signal you can give Google about what your site contains and how often it changes.
What goes in (and what doesn't)
Include only URLs that are:
- Live: return HTTP 200.
- Indexable: no
noindexmeta or header. - Canonical: the URL itself, not a redirect target.
- Self-canonical: its
rel="canonical"points to itself.
Exclude:
- Redirected URLs (3xx).
- Error pages (4xx / 5xx).
- Pages with
noindex. - Duplicates that canonical to another URL.
- Faceted-search URLs you don't want crawled (use
robots.txtfor these).
The minimal valid sitemap
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-04-15</lastmod>
</url>
<url>
<loc>https://example.com/blog/seo-audit</loc>
<lastmod>2026-04-12</lastmod>
</url>
</urlset>
That's it. <changefreq> and <priority> are
ignored by Google, don't bother with them. <lastmod> is the
only optional field that still influences crawling, but only if it's accurate.
Lying about lastmod (e.g. setting today's date on every URL) makes
Google ignore the field entirely.
Limits
- 50,000 URLs per sitemap file.
- 50 MB uncompressed per file.
- If you exceed either, split into multiple sitemaps and reference them from a sitemap index.
Sitemap index for large sites
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
<lastmod>2026-04-17</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-blog.xml</loc>
<lastmod>2026-04-17</lastmod>
</sitemap>
</sitemapindex>
Group sitemaps by content type (pages, blog, products, categories). When
you push new content, only the affected sitemap's lastmod updates
- Google checks the index, sees one changed sitemap, and prioritizes that.
Where to put it & how to submit
- Host at the root:
https://example.com/sitemap.xml. - Reference it in
robots.txt:Sitemap: https://example.com/sitemap.xml. - Submit in Google Search Console: Sitemaps โ enter the URL โ Submit. Bing has a similar tool in Bing Webmaster Tools.
Specialized sitemaps
The standard urlset covers most needs. For specific content
types Google supports extensions:
- Image sitemap (
image:image), useful only if image search drives meaningful traffic. - Video sitemap (
video:video), required for rich video results. - News sitemap (
news:news), only for sites accepted into Google News. - hreflang in sitemaps (
xhtml:link), cleaner than per-page hreflang for sites with many languages.
Common mistakes
- Including
noindexURLs, wastes Google's time and signals inconsistency. - Including 301-redirected URLs, same problem.
- Stale
lastmoddates, either accurate or omitted. - Sitemap blocked in
robots.txt. Yes, this happens. - HTML "sitemaps" (a page of links) submitted instead of XML. Build both; submit the XML.
How AuditAI helps
Every audit checks for sitemap presence, validity, and consistency between sitemap URLs and indexable pages. Run an audit free โ