URL Extractor API
The URL Extractor API discovers and extracts all URLs from a website's sitemap. Point it at a domain and it handles sitemap discovery and URL aggregation.
This endpoint does not deduct credits.
Authentication
All requests require an API key in the Authorization header:
Authorization: Bearer your_api_keyYour API key is available on your Account Page and on the API tab of each recipe. See the main API authentication docs for details.
Endpoint
Full endpoint
https://api.simplescraper.io/v1/extract-urls
The endpoint accepts POST requests with a JSON body.
Request body properties
| Property | Required | Type | Description |
|---|---|---|---|
| domain | Yes | String | Full URL with protocol (e.g., https://example.com) |
| urlLimit | No | Number | Maximum number of URLs to return. |
| sitemapLimit | No | Number | For sitemap-index sites, maximum number of nested sitemaps to traverse. Useful for sampling large sites quickly. |
Example request
async function extractUrls(apiKey, domain) {
const url = 'https://api.simplescraper.io/v1/extract-urls';
const requestBody = {
domain: domain,
urlLimit: 100
};
try {
const response = await fetch(url, {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify(requestBody)
});
const data = await response.json();
console.log(data);
} catch (error) {
console.error('error:', error);
}
}
extractUrls('YOUR_API_KEY', 'https://example.com');Response structure
| Property | Type | Description |
|---|---|---|
| success | Boolean | Whether the request succeeded |
| data | Object | Container for sitemap and URL results |
| data.sitemapUrl | String or null | URL of the discovered sitemap. Null if no sitemap was found. |
| data.urls | Array | List of URLs extracted from the sitemap(s) |
| data.status | String | "completed" on success, "error" on failure |
Example response
{
"success": true,
"data": {
"sitemapUrl": "https://example.com/sitemap.xml",
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
],
"status": "completed"
}
}Error responses
All errors return { success: false, error: "string" }.
| Status code | Error | Message |
|---|---|---|
| 400 | Domain is required | The domain parameter is missing from the request body. |
| 400 | Invalid domain: must be a valid web URL | The domain value is not a valid HTTP/HTTPS URL. |
| 401 | Invalid API key | API key is missing or invalid. |
| 403 | Origin not allowed | No valid API key and the request origin is not allowlisted. |
| 429 | Rate limit exceeded | Daily request limit for your tier has been reached. The error message includes the reset time. See "Rate limits" below. |
| 500 | Internal Server Error | An unexpected error occurred during sitemap discovery or extraction. |
Rate limits
Each response carries:
X-RateLimit-Limit- daily request cap for your tierX-RateLimit-Remaining- requests left in the current 24-hour windowX-RateLimit-Reset- ISO timestamp when the window resetsX-RateLimit-Concurrency- max simultaneous in-flight requests
Per-tier caps:
| Tier | Daily | Concurrent |
|---|---|---|
| Active subscription | 500 | 10 |
| Free tier | 50 | 2 |
Two ways to hit HTTP 429:
- Daily cap reached - error message includes the reset time. Wait until then.
- Concurrency cap reached - error message says
"Too many concurrent requests". Transient. Retry once an in-flight request completes.
Notes
No credit cost: This endpoint does not deduct credits from your account.
Sitemap indexes: If the discovered sitemap is a sitemap index, the endpoint fetches nested sitemaps and aggregates all URLs. Use sitemapLimit to cap how many.
URL limits: urlLimit caps the total URLs returned, applied after all sitemaps are processed.
No sitemap found: If no sitemap can be discovered, the response returns HTTP 200 with status: "no_sitemap_found" and a message linking to Deep scraping URLs.
Partial results: Requests run up to 60 seconds. If traversal hasn't finished, the response returns HTTP 200 with status: "time_limit_reached" and the URLs found so far. Most sites finish well under that.
MCP server access: This endpoint is also available via the Simplescraper MCP server as the extract_urls tool. See the MCP server docs for details.