Sitemap URL Extractor | Extract URLs Instantly

Use our Sitemap URL Extractor to extract all URLs from any sitemap.xml file for SEO analysis and website optimization. Enter the sitemap URL for immediate results.

In your browser Updated 05/2026

Paste the full URL of an XML sitemap or sitemap index — for example https://example.com/sitemap.xml
Useful when the sitemap is behind a login, hosted on an intranet, or you saved a copy locally.
https://
We'll fetch the domain's robots.txt and list every Sitemap: directive we find. Then click any one to extract its URLs.
Privacy: This tool supports both regular sitemaps and sitemap index files. Nothing you submit is stored on our servers — results disappear when you close the tab.

Key Features

  • Three input modes: paste a sitemap URL, paste raw XML, or auto-discover sitemaps from a domain's robots.txt
  • Recursive sitemap-index expansion — fetch up to 50 child sitemaps in one click and merge every URL into a single list
  • Per-URL metadata: lastmod, changefreq and priority extracted from the XML when present
  • Live filter (substring or /regex/) plus six sort modes (A→Z, Z→A, longest, shortest, newest lastmod, default)
  • One-click deduplication when the same URL appears in multiple sub-sitemaps
  • At-a-glance stats: total URLs, unique domains, file-extension breakdown, lastmod date range
  • Download the result as TXT (one URL per line), CSV (with all metadata columns) or JSON
  • Copy the full list, just the visible filtered list, or any single URL with one click
  • Handles huge sitemaps (50,000+ URLs) thanks to server-side parsing and a virtualised result table
  • Free, no signup, no account, no logging — useful for SEO audits, migrations, scraping prep and competitor research

Common Use Cases

  • SEO audits — quickly inventory every URL that a site is exposing to search engines
  • Site migrations — produce the full URL list for redirect mapping before relaunching on a new domain or CMS
  • Competitor research — scan a public sitemap to understand a competitor's content footprint, categories and update cadence
  • Crawl prep — feed URLs into Screaming Frog, Sitebulb, custom Python scrapers or any HTTP-checking tool
  • Internal linking audits — pair the URL list with a content audit spreadsheet to find orphan pages
  • Content gap analysis — diff your sitemap against a competitor's to discover topic gaps
  • QA on a new release — verify your CMS is producing the expected sitemap after a deploy
  • Lastmod inventory — spot stale pages by sorting on the lastmod column and identifying entries that haven't been updated in years
  • Bulk indexing requests — export to CSV and submit URLs in batches to the IndexNow protocol or a Google Search Console URL inspection workflow
  • Compliance and accessibility — produce a master URL list for periodic accessibility (WCAG) or privacy reviews

How to Use

  1. Pick a tab. From URL is the most common path: paste any public sitemap address (most sites publish one at /sitemap.xml).
  2. Optionally toggle Also fetch sub-sitemaps if you suspect the URL is an index file — the tool will follow each child sitemap and merge every URL.
  3. If your sitemap is private or you have it as a file, switch to the Paste XML tab and paste the contents directly.
  4. Don't know where the sitemap is? Use the Find via robots.txt tab and just enter the domain — we'll list every Sitemap: directive declared in robots.txt.
  5. Click Extract URLs. The result table appears with a stats panel above it showing total URLs, unique domains, file-extension breakdown and lastmod range.
  6. Use the filter box to keep only URLs that match a substring (e.g. /blog/) or a regular expression (e.g. /^https:\/\/.+\.pdf$/).
  7. Sort the list with the dropdown — useful for spotting stale pages by lastmod or finding the longest/shortest URLs.
  8. Toggle Dedupe if combined sub-sitemaps included duplicates.
  9. Use the Copy All, TXT, CSV or JSON buttons. CSV preserves lastmod, changefreq and priority columns.

Use this tool from your AI agent

Free JSON API and Model Context Protocol (MCP) server. No signup, no API key, CORS open. Designed for Claude, ChatGPT, Cursor, scripts and frontend apps.

curl -X POST https://mate.tools/api/v1/sitemap-extract.php \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/sitemap.xml"}'
import urllib.request, json

req = urllib.request.Request(
    "https://mate.tools/api/v1/sitemap-extract.php",
    data=json.dumps({"url":"https://example.com/sitemap.xml"}).encode(),
    headers={"Content-Type": "application/json"},
)
with urllib.request.urlopen(req) as r:
    print(json.load(r))
const r = await fetch("https://mate.tools/api/v1/sitemap-extract.php", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({"url":"https://example.com/sitemap.xml"}),
});
console.log(await r.json());

Add to claude_desktop_config.json (Claude Desktop), ~/.cursor/mcp.json (Cursor), or any other MCP-compatible client:

{
  "mcpServers": {
    "mate-tools": {
      "command": "npx",
      "args": ["-y", "@mate-tools/mcp-server"]
    }
  }
}
API documentation OpenAPI 3.1 npm 60 req/min · 600 req/hour · 1 MB body cap

Frequently Asked Questions

An XML sitemap is a file that lists every URL a website wants search engines to know about. Most sites publish theirs at https://example.com/sitemap.xml or list it in robots.txt. Use the Find via robots.txt tab if you're not sure.

A sitemap index is a sitemap that points at other sitemaps — large sites split their URLs across many files. This tool detects both. If you submit an index, you can either get back the list of child sitemaps, or tick Also fetch sub-sitemaps to expand them all into one combined URL list (capped at 50 sub-sitemaps for safety).

There's no hard cap, but very large sitemaps (>100,000 URLs) may be slow to render in the browser. The XML parser itself is server-side and handles huge files easily — the slow part is just rendering the result table.

When present in the XML, we read <lastmod>, <changefreq> and <priority> for each URL. The result table shows lastmod, and the CSV/JSON downloads include all three columns. Sitemap index entries also expose their lastmod when supplied.

Yes — type any substring (case-insensitive) into the filter box, or wrap a regular expression in slashes (e.g. /\.pdf$/). The result counter, copy and download actions all respect the active filter, so you can extract just the slice you care about.

This tool fetches sitemaps anonymously over HTTPS, so password-protected, IP-restricted or staging sitemaps aren't reachable directly. Open the file in your browser, copy the XML, and use the Paste XML tab.

Toggle the Dedupe switch in the result toolbar. When sub-sitemaps overlap (common on multilingual sites that re-publish the same canonical URL across language sitemaps), this is essential.

Yes. Sitemaps are public files, so any publicly accessible site's sitemap is fair game for analysis. It's an excellent way to map a competitor's content categories, depth and update frequency.

No. Each request is processed in memory and discarded as soon as the page is rendered. Nothing is logged, queued or persisted. Reload the page and the previous extraction is gone.

TXT (one URL per line) is best for piping into command-line tools or paste-into-form workflows. CSV opens directly in Excel/Sheets and preserves lastmod, changefreq and priority. JSON is the friendliest for scripts (Python, Node, etc.) and round-trips perfectly.