Skip to content

URL Extractor API

The URL Extractor API discovers and extracts all URLs from a website's sitemap. Point it at a domain and it handles sitemap discovery and URL aggregation.

This endpoint does not deduct credits.

Authentication

All requests require an API key in the Authorization header:

js
Authorization: Bearer your_api_key

Your API key is available on your Account Page and on the API tab of each recipe. See the main API authentication docs for details.


Endpoint

Full endpoint

https://api.simplescraper.io/v1/extract-urls

The endpoint accepts POST requests with a JSON body.


Request body properties

PropertyRequiredTypeDescription
domainYesStringFull URL with protocol (e.g., https://example.com)
urlLimitNoNumberMaximum number of URLs to return.
sitemapLimitNoNumberFor sitemap-index sites, maximum number of nested sitemaps to traverse. Useful for sampling large sites quickly.

Example request

js
async function extractUrls(apiKey, domain) {
  const url = 'https://api.simplescraper.io/v1/extract-urls';

  const requestBody = {
    domain: domain,
    urlLimit: 100
  };

  try {
    const response = await fetch(url, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify(requestBody)
    });
    const data = await response.json();
    console.log(data);
  } catch (error) {
    console.error('error:', error);
  }
}

extractUrls('YOUR_API_KEY', 'https://example.com');

Response structure

PropertyTypeDescription
successBooleanWhether the request succeeded
dataObjectContainer for sitemap and URL results
data.sitemapUrlString or nullURL of the discovered sitemap. Null if no sitemap was found.
data.urlsArrayList of URLs extracted from the sitemap(s)
data.statusString"completed" on success, "error" on failure

Example response

json
{
  "success": true,
  "data": {
    "sitemapUrl": "https://example.com/sitemap.xml",
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3"
    ],
    "status": "completed"
  }
}

Error responses

All errors return { success: false, error: "string" }.

Status codeErrorMessage
400Domain is requiredThe domain parameter is missing from the request body.
400Invalid domain: must be a valid web URLThe domain value is not a valid HTTP/HTTPS URL.
401Invalid API keyAPI key is missing or invalid.
403Origin not allowedNo valid API key and the request origin is not allowlisted.
429Rate limit exceededDaily request limit for your tier has been reached. The error message includes the reset time. See "Rate limits" below.
500Internal Server ErrorAn unexpected error occurred during sitemap discovery or extraction.

Rate limits

Each response carries:

  • X-RateLimit-Limit - daily request cap for your tier
  • X-RateLimit-Remaining - requests left in the current 24-hour window
  • X-RateLimit-Reset - ISO timestamp when the window resets
  • X-RateLimit-Concurrency - max simultaneous in-flight requests

Per-tier caps:

TierDailyConcurrent
Active subscription50010
Free tier502

Two ways to hit HTTP 429:

  • Daily cap reached - error message includes the reset time. Wait until then.
  • Concurrency cap reached - error message says "Too many concurrent requests". Transient. Retry once an in-flight request completes.

Notes

No credit cost: This endpoint does not deduct credits from your account.

Sitemap indexes: If the discovered sitemap is a sitemap index, the endpoint fetches nested sitemaps and aggregates all URLs. Use sitemapLimit to cap how many.

URL limits: urlLimit caps the total URLs returned, applied after all sitemaps are processed.

No sitemap found: If no sitemap can be discovered, the response returns HTTP 200 with status: "no_sitemap_found" and a message linking to Deep scraping URLs.

Partial results: Requests run up to 60 seconds. If traversal hasn't finished, the response returns HTTP 200 with status: "time_limit_reached" and the URLs found so far. Most sites finish well under that.

MCP server access: This endpoint is also available via the Simplescraper MCP server as the extract_urls tool. See the MCP server docs for details.