URL Extractor API

The URL Extractor API discovers and extracts all URLs from a website's sitemap. Point it at a domain and it handles sitemap discovery and URL aggregation.

This endpoint does not deduct credits.

Authentication

All requests require an API key in the Authorization header:

Authorization: Bearer your_api_key

Your API key is available on your Account Page and on the API tab of each recipe. See the main API authentication docs for details.

Endpoint

Full endpoint

https://api.simplescraper.io/v1/extract-urls

The endpoint accepts POST requests with a JSON body.

Request body properties

Property	Required	Type	Description
domain	Yes	String	Full URL with protocol (e.g., `https://example.com`)
urlLimit	No	Number	Maximum number of URLs to return.
sitemapLimit	No	Number	For sitemap-index sites, maximum number of nested sitemaps to traverse. Useful for sampling large sites quickly.

Example request

async function extractUrls(apiKey, domain) {
  const url = 'https://api.simplescraper.io/v1/extract-urls';

  const requestBody = {
    domain: domain,
    urlLimit: 100
  };

  try {
    const response = await fetch(url, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify(requestBody)
    });
    const data = await response.json();
    console.log(data);
  } catch (error) {
    console.error('error:', error);
  }
}

extractUrls('YOUR_API_KEY', 'https://example.com');

Response structure

Property	Type	Description
success	Boolean	Whether the request succeeded
data	Object	Container for sitemap and URL results
data.sitemapUrl	String or null	URL of the discovered sitemap. Null if no sitemap was found.
data.urls	Array	List of URLs extracted from the sitemap(s)
data.status	String	`"completed"` on success, `"error"` on failure

Example response

json

{
  "success": true,
  "data": {
    "sitemapUrl": "https://example.com/sitemap.xml",
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3"
    ],
    "status": "completed"
  }
}

Error responses

All errors return { success: false, error: "string" }.

Status code	Error	Message
400	Domain is required	The `domain` parameter is missing from the request body.
400	Invalid domain: must be a valid web URL	The `domain` value is not a valid HTTP/HTTPS URL.
401	Invalid API key	API key is missing or invalid.
403	Origin not allowed	No valid API key and the request origin is not allowlisted.
429	Rate limit exceeded	Daily request limit for your tier has been reached. The error message includes the reset time. See "Rate limits" below.
500	Internal Server Error	An unexpected error occurred during sitemap discovery or extraction.

Rate limits

Each response carries:

X-RateLimit-Limit - daily request cap for your tier
X-RateLimit-Remaining - requests left in the current 24-hour window
X-RateLimit-Reset - ISO timestamp when the window resets
X-RateLimit-Concurrency - max simultaneous in-flight requests

Per-tier caps:

Tier	Daily	Concurrent
Active subscription	500	10
Free tier	50	2

Two ways to hit HTTP 429:

Daily cap reached - error message includes the reset time. Wait until then.
Concurrency cap reached - error message says "Too many concurrent requests". Transient. Retry once an in-flight request completes.

Notes

No credit cost: This endpoint does not deduct credits from your account.

Sitemap indexes: If the discovered sitemap is a sitemap index, the endpoint fetches nested sitemaps and aggregates all URLs. Use sitemapLimit to cap how many.

URL limits: urlLimit caps the total URLs returned, applied after all sitemaps are processed.

No sitemap found: If no sitemap can be discovered, the response returns HTTP 200 with status: "no_sitemap_found" and a message linking to Deep scraping URLs.

Partial results: Requests run up to 60 seconds. If traversal hasn't finished, the response returns HTTP 200 with status: "time_limit_reached" and the URLs found so far. Most sites finish well under that.

MCP server access: This endpoint is also available via the Simplescraper MCP server as the extract_urls tool. See the MCP server docs for details.

Scraping and saving recipes

Bulk scraping in the browser

Scraping data behind a login

Multiple page and infinite scroll

URL Extractor API

Authentication

Endpoint

Request body properties

Example request

Response structure

Example response

Error responses

Rate limits

Notes

URL Extractor API ​

Authentication ​

Endpoint ​

Request body properties ​

Example request ​

Response structure ​

Example response ​

Error responses ​

Rate limits ​

Notes ​

URL Extractor API

Authentication

Endpoint

Request body properties

Example request

Response structure

Example response

Error responses

Rate limits

Notes