Enter two URLs and we measure how much of their content overlaps, as a similarity percentage. Catch near-duplicate pages that split rankings or trip duplicate-content issues — before they cost you traffic.
⚡ Interactive demo — sample data
78% content similarity between the two pages — near-duplicate. Search engines may rank only one. Consolidate or canonicalize.
Content similarity: 78% — near-duplicate. These two location pages share nearly all of their body copy.Issue
Verdict: search engines will likely pick one version to rank and split signals across both. Add a canonical or give each page unique local content.Issue
URL 1: 410 words • URL 2: 395 words — similar length, consistent with a single template reused with the city name swapped.Warning
Both pages fetched and compared on main content only — shared menu, header and footer were excluded.Looks good
Enter two URLs and we measure how much of their content overlaps, as a similarity percentage. Catch near-duplicate pages that split rankings or trip duplicate-content issues — before they cost you traffic.
How it works
Enter two URLs
Paste two public URLs separated by a comma — for example 'https://site.com/page-a, https://site.com/page-b'. They can be two pages on your own site, your page versus a competitor's, or a www vs non-www version of the same address. We fetch both.
We compare the visible text
From each page we extract the main readable text — stripping scripts, styling, navigation, headers and footers — then break it into overlapping three-word phrases. Comparing those phrase sets tells us how much wording the two pages genuinely share, not just whether they use the same vocabulary.
Read the similarity score
You get a single content-similarity percentage with a clear verdict: near-duplicate, substantial overlap, or distinct. Use it to decide whether to consolidate the pages, add a canonical tag, rewrite one, or leave them alone.
What we check
Content similarity percentage — A 0–100% score for how much wording the two pages share, measured by overlapping three-word phrases. It reflects real shared content, not coincidental use of the same common words, so two pages about the same topic written differently will score low.
Near-duplicate threshold (≥70%) — At 70% similarity or higher the pages are flagged as near-duplicates. This is the danger zone where search engines may treat the pages as the same content and pick only one to show. We mark it red.
Substantial overlap (40–70%) — Between 40% and 70% the pages share a lot but aren't identical — common with templated pages or product variants. We flag it amber so you can decide whether to differentiate them or set a canonical.
Distinct content (under 40%) — Below 40% the pages are mostly unique and at low duplicate-content risk. We mark it green. Some overlap here is normal — shared headers, calls to action and boilerplate naturally show up across a site.
Word counts for each page — We report how many words we read from each URL. A lopsided count (one page much thinner than the other) is a clue that one page is being fetched incompletely, or that one is genuinely much lighter than the other.
Main content focus — We compare the primary text of each page, not the shared chrome. That matters because two completely different pages on the same site share a menu and footer — focusing on the body keeps that templated boilerplate from inflating the score.
Common issues we catch
Templated and boilerplate-heavy pages — Location pages, service-area pages, or category pages built from one template with only a city name swapped often score very high. To search engines they look like the same page repeated. Add genuinely unique content to each — local detail, photos, specifics — so each earns its own place.
www vs non-www and http vs https — The same page served at http and https, or with and without www, is a 100% duplicate of itself. This is one of the most common causes of split ranking signals. Pick one canonical version and 301-redirect the others to it.
Product or color variants — E-commerce stores often have near-identical pages for the same product in different colors or sizes. They'll score high here. The usual fix is a canonical tag pointing the variants at one main product URL, so ranking signals consolidate on a single page.
Scraped or syndicated content — If your article is republished on partner sites — or someone has copied it — the pages will be near-duplicates. Where you've syndicated on purpose, ask partners to add a canonical pointing back to your original so it stays the version that ranks.
Expecting a penalty that doesn't exist — There is no official Google 'duplicate content penalty' for ordinary duplication. The real cost is different: Google picks one canonical version and ignores the rest, and your ranking signals get split across URLs instead of pooling on one. The harm is dilution, not punishment.
Pagination, filters and tracking parameters — URLs with sort, filter, or tracking parameters (?utm=, ?sort=) often serve nearly the same content under many addresses. They'll score as near-duplicates. Canonical tags and tidy parameter handling keep search engines focused on the main version.
A low score that's still a problem — Two pages can score under 40% yet still compete for the same keyword because they target the same intent. Low text overlap doesn't rule out keyword cannibalization — if both pages chase the same query, you may still want to consolidate or differentiate them.
Where this matters
Google & Bing — Both deduplicate near-identical pages by choosing one canonical version to index and rank, ignoring the rest. Neither imposes a standalone penalty for ordinary duplication — the practical effect is that one of your pages disappears from results and signals split until you consolidate.
WordPress, Shopify & e-commerce — These platforms generate duplicate-prone URLs constantly — tag and category archives, product variants, filtered and sorted views. Comparing two suspect URLs here tells you whether you need a canonical tag, a redirect, or a noindex on the lesser version.
Multi-location & franchise sites — Location and service-area pages spun from one template are the classic near-duplicate trap. This check shows how thin the differences really are, so you know which pages need real, unique local content before they'll rank independently.
Migrations & www/https consolidation — After a domain move or HTTPS switch, old and new URLs can both stay live and serve identical content. Comparing them confirms the duplication so you can 301-redirect the old version and consolidate authority on the new one.
Content syndication & PR — When the same article runs on multiple sites, this comparison confirms how close the copies are — the cue to request a cross-domain canonical back to your original so your version stays the one search engines credit.
Frequently asked questions
What counts as duplicate content?
Content that is identical or very similar across two or more URLs. That includes obvious copies (the same article on two pages) and sneakier cases like the same page served at www and non-www, or product variants with near-identical descriptions. This tool measures how close two specific URLs are so you can judge for yourself.
Is there a Google penalty for duplicate content?
No — for ordinary duplication there's no penalty. Google simply picks one version as canonical, indexes that, and sets the others aside. The real cost is that your ranking signals get split across URLs instead of pooling on one, and Google might pick a version you didn't intend. (Deliberately copying content to manipulate rankings is a separate spam issue.)
What similarity percentage should I worry about?
We flag 70% and above as near-duplicate — that's where search engines are most likely to collapse the pages into one. The 40–70% band is substantial overlap worth reviewing, especially for templated pages. Under 40% the pages are mostly distinct and at low risk.
How do I fix two pages that are too similar?
You have four main options. Consolidate them into one stronger page and 301-redirect the other; add a canonical tag on the duplicate pointing to the version you want to rank; rewrite one so they genuinely differ; or noindex the lesser page if it must stay live for users but shouldn't compete in search.
Should my product variant pages have canonical tags?
Usually, yes. If color or size variants share nearly all their copy, pointing their canonical tags at one main product URL consolidates ranking signals onto that page instead of spreading them thin. Keep separate indexable pages only where each variant has substantial unique content and search demand.
Does shared header, footer or menu text count against me?
We focus on each page's main content and strip out the navigation, header and footer, so shared site chrome doesn't inflate the score much. Search engines are also good at recognizing boilerplate. It's duplication in the body content that matters, which is what this tool measures.
Can I compare my page against a competitor's?
Yes — just put both URLs in. A high score against a competitor can mean one of you syndicated or copied the other, or that you're both using the same manufacturer or supplier description. It's a useful check for spotting where you need to write something genuinely your own.
Why did two different-looking pages score high?
Most often because the visible body text is largely the same even if the design differs — common with templated location pages or reskinned variants. Layout and styling don't count here; we compare the actual words, so two pages that look different but read the same will score high.
This is one of several free SEO tools from Custom Web Audits.
For a complete, prioritized analysis of your whole website,
run a full audit.