Duplicate content — the same or near-identical text appearing on multiple URLs — confuses search engines about which version to rank, dilutes PageRank across copies, and can trigger manual actions for egregious scraping.

What counts as duplicate content?

Google defines duplicate content broadly: substantial blocks of content within or across domains that completely match or are "appreciably similar." This covers:

Printer-friendly page variants (/print/).
HTTP vs HTTPS or www vs non-www serving the same content.
URL parameters that reorder products or track sessions: /shoes?color=red&size=10 vs /shoes?size=10&color=red.
Pagination (/page/1) that shares content with the root category.
Syndicated content republished verbatim on partner sites.
CMS-generated tag, category, and archive pages that aggregate the same articles.

Why it hurts rankings

When Googlebot finds two identical pages, it must decide which to index. It uses inbound links, PageRank, and canonical hints, but may pick the wrong version. Worse: link equity splits across duplicates instead of consolidating on the canonical.

Mental model: every duplicate URL is a vote going to the wrong candidate.

Identifying duplicate content

AuditAI: flags pages with identical <title> or meta description — the fastest proxy for duplication.
Google Search Console → Pages → "Duplicate, Google chose different canonical."
Screaming Frog → Bulk export → filter by content hash collision.
Siteliner.com: free cross-page duplicate analysis.
site:yourdomain.com "exact sentence": spot-check for scrapers republishing your content.

The canonical tag, your primary fix

For content that must exist on multiple URLs, use rel=canonical to declare the preferred version:

<!-- On the duplicate -->
<link rel="canonical" href="https://example.com/shoes/red" />

Rules:

Self-referential canonicals on every page (even the canonical itself).
Canonical must be an absolute URL with scheme + domain.
Canonical ≠ noindex: combining them sends conflicting signals.

Fixing www vs non-www and HTTP vs HTTPS

Redirect ALL variants to one canonical origin at the server level:

# Nginx: force HTTPS + www
server {
  listen 80;
  server_name example.com www.example.com;
  return 301 https://www.example.com$request_uri;
}
server {
  listen 443 ssl;
  server_name example.com;
  return 301 https://www.example.com$request_uri;
}

Handling syndicated content

Ask the publisher to add <link rel="canonical" href="your-original"> — Medium and Substack both support this.
Wait at least 24–48 hours after your original is indexed before syndicating.
Never syndicate before your own URL is crawled — the syndication may be indexed first.

CMS archive and tag pages

WordPress generates /tag/seo/, /category/tips/, and /page/2/ pages that duplicate post excerpts. Noindex thin archives:

<meta name="robots" content="noindex, follow" />

Keep follow so PageRank flows through links; only noindex prevents the thin archive from competing with the canonical post.

Duplicate content checklist

☑ Self-referential canonical on every page.
☑ 301 redirect www → non-www (or vice versa) at server level.
☑ 301 redirect HTTP → HTTPS.
☑ URL parameters handled via canonical or GSC parameter tool.
☑ Printer/PDF variants canonicalized to main page.
☑ CMS archive/tag pages noindexed or consolidated.
☑ Syndicated content has canonical back to origin.
☑ No conflicting canonical + noindex on same page.

Run a free AuditAI scan to find duplicate-content issues on your site →

Duplicate Content: Causes, Consequences, and How to Fix It

What counts as duplicate content?

Why it hurts rankings

Identifying duplicate content

The canonical tag, your primary fix

Fixing www vs non-www and HTTP vs HTTPS

Handling syndicated content

CMS archive and tag pages

Duplicate content checklist

Ready to audit your site?

What counts as duplicate content?

Why it hurts rankings

Identifying duplicate content

The canonical tag, your primary fix

Fixing www vs non-www and HTTP vs HTTPS

Handling syndicated content

CMS archive and tag pages

Duplicate content checklist

Related guides

Ready to audit your site?