Using the Simplescraper API

The Simplescraper API allows you to extract structured data programmatically from web pages, as well as create and manage your scrape recipes. This guide covers how to effectively use the API for data extraction and recipe management.

Tip: If you're just getting started with Simplescraper, our FAQ explains the concept of scrape recipes.

Authentication

All requests to the API require the inclusion of an API key. The API key should be sent in the Authorization header using the Bearer token format:

Authorization: Bearer your_api_key

An API key is provided when you sign up to Simplescraper and can be found on the API tab of each recipe that you create. Code examples of how to include the API key in requests are provided in the sections below.

Request structure

POST and PUT Request structure

For POST and PUT requests, the API key should be sent in the Authorization header using the Bearer token format and the Content-Type header must be set to application/json.

Here's an example of a POST request to the /recipes/:recipeId/run endpoint:

const apikey = 'ap1k3y';
const recipeId = '12345';
const url = `https://api.simplescraper.io/v1/recipes/${recipeId}/run`;

const requestBody = {
  sourceUrl: sourceUrl,
  // other optional properties can be included here
  extractMarkdown: false,
  runAsync: false,
};

const response = fetch(url, {
  method: 'POST',
  headers: {
      'Authorization': `Bearer ${apikey}`,
      'Content-Type': 'application/json'
  },
  body: JSON.stringify(requestBody)
})
// handle response...

Notes:

URL encoding of the sourceUrl is not necessary when sent in the request body, as JSON handles special characters automatically.
Set the Content-Type header to application/json for POST requests.

GET Request structure

For GET requests, the API key should be sent in the Authorization header using the Bearer token format. Other properties should also be passed in as query parameters, and encoded where necessary.

As an example, here's a GET request to the /results/:resultsID endpoint:

const apikey = 'ap1k3y';
const resultsId = '12345';
const requestUrl = `https://api.simplescraper.io/v1/results/${resultsId}`;

const response = fetch(requestUrl, {
  headers: {
    'Authorization': `Bearer ${apikey}`
  }
});
// handle response...

API Endpoints

The base URL for all API requests is: https://api.simplescraper.io/v1, followed by the specific endpoint.

Below is a table of all available endpoints. To use them, append these endpoints to the base URL and be sure to use the correct HTTP method (GET, POST, PUT, etc.) as shown for each endpoint.

endpoint	Method	description
/recipes/:recipeId/run	POST	Run the specified scrape recipe and return results or status
/recipes	POST	Create a new recipe using provided details
/recipes	GET	List and search scrape recipes
/recipes/:recipeId	GET	Get information about a recipe
/recipes/:recipeId	PUT	Update an existing recipe with new details
/recipes/:recipeId/results-latest	GET	Retrieve the most recent result for a specific recipe
/results/:resultsId	GET	View results or scrape progress for specified results ID. Status key indicates the progress of scrape task
/recipes/:recipeId/results-history	GET	Retrieve recipe ids, scrape date, and number of pages scraped of last 100 scrape runs of specified recipe
/recipes/:recipeId/batch/urls	POST	Update or replace batch scraper (crawler) URL list
/smart-extract	POST	Extract data and reusable CSS selectors from any website using AI

For example, the full URL to run a scrape recipe looks like this:

https://api.simplescraper.io/v1/recipes/:recipeId/run

Note: Replace :recipeId and :resultsId with actual IDs when making requests.

Detailed information about interacting with each of the main endpoints is explained below.

POST /recipe/:recipeId/run

Full endpoint

https://api.simplescraper.io/v1/recipes/:recipeId/run

Send POST requests to this endpoint to initiate a scrape run for a recipe and return data.

If the scrape time exceeds 90 seconds, a JSON object containing a results_id and status of 'running' is returned which can then be polled at https://api.simplescraper.io/v1/results/:resultsId.

Request body properties

The following properties should be sent in the request body as JSON:

Property	Required	Type	Description
sourceUrl	Yes	URL	The URL of the page to be scraped. This will update the current URL of the recipe. If not included, the existing recipe URL will be used.
runAsync	No	Boolean	If true, returns a result ID immediately and runs the scrape task asynchronously. The result ID can then be used to poll the /results/:resultsID endpoint.
extractMarkdown	No	Boolean	If true, a markdown version of the page will be extracted in addition to structured data (see https://simplescraper.io/docs/extract-markdown)
offset	No	Number	The starting point for results retrieval.
useCrawler	No	Boolean	If true, scrape URLs that have been added to the crawler via the dashboard or `/recipes/:recipeId/batch/urls` endpoint, instead of the source URL. Default false.

Example POST request

async function runRecipe(apikey, recipeId, sourceUrl) {
  
  const url = `https://api.simplescraper.io/v1/recipes/${recipeId}/run`;
  
  const requestBody = {
    sourceUrl: sourceUrl,
    runAsync: false
  };

  try {
    const response = await fetch(url, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${apikey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify(requestBody)
    });
    const data = await response.json();
    console.log(data);
  } catch (error) {
    console.error('error:', error);
  }
}

// call function
runRecipe('YOUR_API_KEY', 'YOUR_RECIPE_ID', 'https://example.com');

Notes:

GET requests to /recipe/:recipeId/run are also supported for systems that can't make POST requests, however POST is recommended whenever possible. When sending a GET request, pass properties as URL parameters and ensure proper URL encoding of the sourceUrl.

Example GET request
js
async function runRecipe(apiKey, recipeId, sourceUrl) {
const url = `https://api.simplescraper.io/v1/recipes/${recipeId}/run?sourceUrl=${encodeURIComponent(sourceUrl)}`;

try {
const response = fetch(url, {
 headers: {
   'Authorization': `Bearer ${apikey}`
 }
});
const data = await response.json();
console.log(data);
} catch (error) {
console.error('error:', error);
}
}

// call function
runRecipe('YOUR_API_KEY', 'YOUR_RECIPE_ID', 'https://example.com');

Response structure

Property	Example Value	Explanation	Options/Types
recipe_id	"rtJjthGverod4EQkt4t4d"	ID of the recipe being scraped	String
results_id	"pAioZevQJaqpjod4EQkd"	Unique identifier for the results of the scrape task	String
name	"Example recipe"	Name of the recipe	String
url	"https://example.com"	source URL of the recipe	String (valid URL)
date_scraped	"2024-08-22T09:41:00.000Z"	Start time of the scrape operation	String (ISO 8601 date format)
status	"completed"	Current status of the scrape job	String: "completed", "failed", "running"
data	[...]	Main payload of scraped data	Array of objects (structure depends on scrape target)
screenshots	[{ "url_uid": 1, "url": "https://...", "screenshot": "https://" }]	Screenshots of each page scraped	Array of objects
errors	[ { url: '', page_message: 'cannot find element', response_code: 200, screenshot: '' } ]	Error details for pages that did not return data successfully	Array of objects

Example response (no timeout)

{
  "recipe_id": "rtJjthGverod4EQkt4t4d",
  "results_id": "pAioZevQJaqpjod4EQkd",
  "name": "Example scrape recipe",
  "url": "https://example.com",
  "date_scraped": "2024-08-22T09:41:00.000Z",
  "status": "completed",
  "status_code": 200,
  "data": [...],
  "screenshots": [
    {
      "url_uid": 1,
      "url": "https://...",
      "screenshot": "https://..."
    }
  ],
  "errors": [
	{
    "url_uid": 1,
    "url": "https://...",
    "page_message": "",
	"response_code": "",
    "screenshot": "https://..."
   }
  ]
}

Example response (timeout)

{
  "status": "running",
  "results_id": "r4t9iyofr234rtr9j",
  "message": "The task is still running. Please check status at https://api.simplescraper.io/v1/results/r4t9iyofr234rtr9j or await webhook notification if configured."
}

POST /recipes

Full endpoint

https://api.simplescraper.io/v1/recipes

Sending a POST request to this endpoint creates a new scrape recipe with the specified name, URL, and selectors. The created recipe's ID and details are returned.

Request body properties

The following properties should be sent in the request body as JSON:

Property	Required	Type	Description
name	Yes	String	The name of the recipe.
url	Yes	URL	The URL of the page to be scraped. Must be a valid URL.
selectors	Yes	Array	Array of selector objects that define what data to extract. Maximum 100 selectors.

Each selector object in the selectors array should have the following properties:

Property	Required	Type	Description
name	Yes	String	Name of the data point to extract.
selector	Yes	String	CSS selector string targeting the element to scrape.
uid	No	String	Unique identifier for the selector. If not provided, one will be generated automatically.

Example request

async function createRecipe(apiKey, name, url, selectors) {
  const requestUrl = 'https://api.simplescraper.io/v1/recipes';
  
  try {
    const response = await fetch(requestUrl, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        name: name,
        url: url,
        selectors: selectors
      })
    });
    
    const data = await response.json();
    return data;
  } catch (error) {
    console.error('Error creating recipe:', error);
  }
}

// example
const apiKey = 'YOUR_API_KEY';
const name = 'Product Prices Scraper';
const url = 'https://example.com/products';
const selectors = [
  {
    name: 'Product Name',
    selector: '.product-title'
  },
  {
    name: 'Price',
    selector: '.product-price'
  }
];

createRecipe(apiKey, name, url, selectors);

Response structure

Property	Example Value	Explanation
success	true	Whether the recipe was created successfully
status	"completed"	Status of the operation
recipe_id	"rtJjthGverod4EQkt4t4d"	ID of the newly created recipe
data		Created recipe data

Example response

{
  "success": true,
  "status": "completed",
  "recipe_id": "rtJjthGverod4EQkt4t4d",
  "data": {
    "recipe_id": "rtJjthGverod4EQkt4t4d",
    "name": "Product Prices Scraper",
    "url": "https://example.com/products",
    "date_created": "2025-05-12T09:41:00.000Z",
    "selectors": [
      {
        "name": "Product Name",
        "selector": ".product-title",
        "uid": "a4b2-c3d9-e7f1-g5h6"
      },
      {
        "name": "Price",
        "selector": ".product-price",
        "uid": "j8k7-l9m4-n2p5-q3r8"
      }
    ]
  }
}

GET /recipes

Full endpoint

https://api.simplescraper.io/v1/recipes

List and search scrape recipes with optional filtering and sorting.

Query Parameters

Parameter	Required	Type	Description
q	No	String	Search term to filter recipes by name or URL
host	No	String	Filter recipes by specific host (e.g., "x.com")
sort	No	String	Field to sort by (date_created, date_last_ran, name). Default: date_created
order	No	String	Sort order (asc or desc). Default: desc
limit	No	Number	Maximum number of recipes to return. Default: 50
offset	No	Number	Number of recipes to skip for pagination. Default: 0
has_run	No	Boolean	If true, include only recipes that have been run before

Example request

async function listRecipes(apiKey, params = {}) {
  const queryString = new URLSearchParams(params).toString();
  const requestUrl = `https://api.simplescraper.io/v1/recipes?${queryString}`;
  
  try {
    const response = await fetch(requestUrl, {
      method: 'GET',
      headers: {
        'Authorization': `Bearer ${apiKey}`
      }
    });
    
    const data = await response.json();
    return data;
  } catch (error) {
    console.error('Error listing recipes:', error);
  }
}

// Usage examples
// List all recipes
listRecipes('YOUR_API_KEY');

// Search for Amazon recipes
listRecipes('YOUR_API_KEY', { q: 'amazon' });

// Find recipes from a specific host, sorted by last run date
listRecipes('YOUR_API_KEY', { 
  host: 'amazon.com',
  sort: 'date_last_ran', 
  order: 'desc' 
});

// Pagination example
listRecipes('YOUR_API_KEY', { limit: 10, offset: 20 });

Response structure

Property	Description
success	Whether the request was successful
status	Status of the operation
total	Total number of recipes matching the query
count	Number of recipes returned in this response
offset	Current pagination offset
limit	Current pagination limit
data	Array of recipe objects

Example response

{
  "success": true,
  "status": "completed",
  "total": 157,
  "count": 2,
  "offset": 0,
  "limit": 2,
  "data": [
    {
      "recipe_id": "rtJjthGverod4EQkt4t4d",
      "name": "Twitter Profile Scraper",
      "url": "https://twitter.com/elonmusk",
      "host": "twitter.com",
      "date_created": "2025-05-12T00:41:00.000Z",
      "date_last_ran": "2025-05-14T00:22:45.000Z",
    },
    {
      "recipe_id": "aB3cD4eFgHiJkLmNoPqR",
      "name": "Twitter Search Results",
      "url": "https://twitter.com/search?q=web%20scraping",
      "host": "twitter.com",
      "date_created": "2025-05-10T00:00:10.000Z",
      "date_last_ran": null,
    }
  ]
}

GET /recipes/:recipeId

Full endpoint

https://api.simplescraper.io/v1/recipes/:recipeId

Get detailed information about a specific scrape recipe.

Query Parameters

Parameter	Required	Type	Description
recipeId	Yes	String	The ID of the recipe to fetch

Example request

async function getRecipe(apiKey, recipeId) {
  const requestUrl = `https://api.simplescraper.io/v1/recipes/${recipeId}`;
  
  try {
    const response = await fetch(requestUrl, {
      method: 'GET',
      headers: {
        'Authorization': `Bearer ${apiKey}`
      }
    });
    
    const data = await response.json();
    return data;
  } catch (error) {
    console.error('Error fetching recipe:', error);
  }
}

// Usage example
getRecipe('YOUR_API_KEY', 'rtJjthGverod4EQkt4t4d');

Response structure

Property	Description
success	Whether the request was successful
data	Recipe object with complete details

Example response

{
  "success": true,
  "data": {
    "recipe_id": "rtJjthGverod4EQkt4t4d",
    "name": "Twitter Profile Scraper",
    "url": "https://twitter.com/elonmusk",
    "host": "twitter.com",
    "date_created": "2025-05-12T09:41:00.000Z",
    "date_last_ran": "2025-05-14T01:22:45.000Z",
    "selectors": [
      {
        "name": "Username",
        "selector": "div[data-testid='UserName'] span",
        "uid": "a4b2-c3d9-e7f1-g5h6"
      },
      {
        "name": "Bio",
        "selector": "div[data-testid='UserDescription']",
        "uid": "j8k7-l9m4-n2p5-q3r8"
      },
    ]
  }
}

PUT /recipes/:recipeId

Full endpoint

https://api.simplescraper.io/v1/recipes/:recipeId

Updates an existing recipe with new details. This endpoint can be used to update the recipe name and/or selectors.

Request body properties

The following properties should be sent in the request body as JSON:

Property	Required	Type	Description
name	No	String	The updated name of the recipe.
url	No	String	The updated URL of the page to be scraped.
selectors	No	Array	Array of CSS selector objects that define what data to extract.
update_mode	No	String	Mode for updating selectors: `replace` (default) or `merge`. When set to `merge`, existing selectors with the same name will be updated, and new ones will be added.

Each selector object in the selectors array should have the following properties:

Property	Required	Type	Description
name	Yes	String	Name of the data point to extract.
selector	Yes	String	CSS selector string targeting the element to scrape.
uid	No	String	Unique identifier for the selector. If not provided for a new selector, one will be generated automatically.

Example request

async function updateRecipe(apiKey, recipeId, updates, updateMode = 'replace') {
  const requestUrl = `https://api.simplescraper.io/v1/recipes/${recipeId}`;
  
  try {
    const response = await fetch(requestUrl, {
      method: 'PUT',
      headers: {
        'Authorization': `Bearer ${apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        ...updates,
        update_mode: updateMode
      })
    });
    
    const data = await response.json();
    return data;
  } catch (error) {
    console.error('Error updating recipe:', error);
  }
}

// update name only
updateRecipe(
  'YOUR_API_KEY',
  'rtJjthGverod4EQkt4t4d',
  { name: 'Updated Product Scraper' }
);

// update selectors with merge mode
updateRecipe(
  'YOUR_API_KEY',
  'rtJjthGverod4EQkt4t4d',
  {
    selectors: [
      {
        name: 'Product Name',
        selector: '.new-product-title-class'
      },
      {
        name: 'New Field',
        selector: '.new-field-class'
      }
    ]
  },
  'merge'
);

Response structure

Property	Example Value	Explanation
success	true	Whether the recipe was updated successfully
status	"completed"	Status of the operation
recipe_id	"rtJjthGverod4EQkt4t4d"	ID of the updated recipe
data		Updated recipe data

Example response

{
  "success": true,
  "status": "completed",
  "recipe_id": "rtJjthGverod4EQkt4t4d",
  "data": {
    "recipe_id": "rtJjthGverod4EQkt4t4d",
    "name": "Updated Product Scraper",
    "url": "https://example.com/products",
    "selectors": [
      {
        "name": "Product Name",
        "selector": ".new-product-title-class",
        "uid": "a4b2-c3d9-e7f1-g5h6"
      },
      {
        "name": "Price",
        "selector": ".product-price",
        "uid": "j8k7-l9m4-n2p5-q3r8"
      },
      {
        "name": "New Field",
        "selector": ".new-field-class",
        "uid": "c5d4-e3f2-g1h0-i9j8"
      }
    ]
  }
}

GET /recipes/:recipeId/results-latest

Full endpoint

https://api.simplescraper.io/v1/recipes/:recipeId/results-latest

Retrieve the most recent scrape results for a specific recipe.

Path Parameters

Parameter	Required	Type	Description
recipeId	Yes	String	The ID of the recipe to get results for

Query Parameters

Parameter	Required	Type	Description
limit	No	Number	Maximum number of results to return. Default: 2000
offset	No	Number	Number of results to skip for pagination. Default: 0
orderBy	No	String	Field to order results by ('index' or 'timestamp'). Default: 'index'

Example request

async function getRecipeLatestResults(apiKey, recipeId, params = {}) {
  const queryString = new URLSearchParams(params).toString();
  const requestUrl = `https://api.simplescraper.io/v1/recipes/${recipeId}/results-latest${queryString ? '?' + queryString : ''}`;
  
  try {
    const response = await fetch(requestUrl, {
      method: 'GET',
      headers: {
        'Authorization': `Bearer ${apiKey}`
      }
    });
    
    const data = await response.json();
    return data;
  } catch (error) {
    console.error('Error fetching latest results:', error);
  }
}

// usage examples
// get all latest results for a recipe
getRecipeLatestResults('YOUR_API_KEY', 'rtJjthGverod4EQkt4t4d');

// order by timestamp instead of index
getRecipeLatestResults('YOUR_API_KEY', 'rtJjthGverod4EQkt4t4d', {
  orderBy: 'timestamp'
});

Response structure

Property	Description
success	Whether the request was successful
results_id	ID of the results set
date_completed	When the scrape was completed
data	Array of scraped data items
markdown	Optional markdown representation (if available)

Example response

{
  "success": true,
  "results_id": "pAioZevQJaqpjod4EQkd",
  "date_completed": "2025-05-14T01:22:45.000Z",
  "data": [
    {
      "Username": "elonmusk",
      "Bio": "Mars & Cars",
      "Follower Count": "158.7M",
      "timestamp": 1692691000000
    },
    {
      "Username": "x",
      "Bio": "Everything app",
      "Follower Count": "27.4M",
      "timestampimestamp": 1692691000000
    }
  ]
}

GET /results/:resultsID

Full endpoint:

https://api.simplescraper.io/v1/results/:resultsId

Get results for a particular scrape task based on the result ID.

Check status property for value of completedto determine if task has finished. status of running indicates the task is still in progress.

Query Parameters

Parameter	Required	Description
apikey	Yes	The API key for user authentication.
limit	No	The maximum number of results to return.
orderBy	No	Field to order scrape results by ('index' or 'timestamp'). Default: 'index'
offset	No	The starting point for results retrieval.

Example request

async function getResults(apiKey, resultsId) {
  const url = `https://api.simplescraper.io/v1/results/${resultsId}`;
  
  try {
    const response = fetch(url, {
    headers: {
      'Authorization': `Bearer ${apikey}`
    }
   });
    const data = await response.json();
    console.log(data);
  } catch (error) {
    console.error('error:', error);
  }
}

// Usage
getResults('YOUR_API_KEY', 'YOUR_RESULTS_ID');

Example response

// same as successful call to /recipe/:recipeId/run
{
    "recipe_id": "rtJjthGverod4EQkt4t4d",
  	"results_id": "pAioZevQJaqpjod4EQkd",
  	"name": "Example scrape recipe",
  	"url": "https://example.com",
  	"date_scraped": "2024-08-22T09:41:00.000Z",
  	"status": "completed",
    "data": [...],
    "screenshots": [...],
    "errors": [...]
  }

POST /recipes/:recipeId/batch/urls

Full endpoint:

https://api.simplescraper.io/v1/recipes/:recipeId/batch/urls

Update or replace the batch scraper (crawler) URLs. The endpoint allows adding new URLs or replacing existing ones in the batch collection. Use the batch_mode parameter to specify whether to append or replace URLs.

The endpoint returns the result of the operation, including success status, a summary of new and existing URLs, and any invalid URLs with detailed error information.

Note that by default, when running a recipe, the Simplescraper API does not use the batch scraper (crawler) unless the useCrawler flag is specified. See '/recipes/:recipeId/run' endpoint section for more details.

Parameters

Parameter	Required	Description
apikey	Yes	The API key for user authentication.
batch_mode	No	Operation mode, either `append` (default) or `replace`.
batch_urls	Yes	An array of URLs to be processed in the batch operation.

Example request

async function addBatchUrls(apiKey, recipeId, batchMode, urls) {
  const url = `https://api.simplescraper.io/v1/recipes/${recipeId}/batch/urls`;
  
  try {
    const response = await fetch(url, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        batch_mode: batchMode,
        batch_urls: urls
      })
    });
    const data = await response.json();
    console.log(data);
  } catch (error) {
    console.error('error:', error);
  }
}

// Usage
addBatchUrls('YOUR_API_KEY', 'YOUR_RECIPE_ID', 'append', ['https://example.com/page1', 'https://example.com/page2']);

Example response

{
  "success": true, // true, false 'partial'
  "summary": {
    "totalExisting": 150,
    "totalNew": 30,
    "totalErrors": 2
  },
  "data": {
    "newUrls": [
      "https://example.com/page1",
      "https://example.com/page2"
    ],
    "errorDetails": [
      {
        "url": "https://example.com/badurl",
        "message": "Invalid URL format",
        "type": "INVALID_URL_FORMAT"
      },
      {
        "url": "https://example.com/anotherbadurl",
        "message": "Invalid URL format",
        "type": "INVALID_URL_FORMAT"
      }
    ]
  }
}

Notes

batch_mode:
- append mode will add new URLs to the existing list, with a cap of 5000 URLs.
- replace mode will clear existing URLs and replace them with the provided list, enforcing the same limit.
Validation: URLs are validated for format, length, protocol, and TLD correctness. Invalid URLs are listed in the response's errorDetails array.
Limit: A maximum of 5000 URLs can be stored in a batch at any time. Attempting to exceed this limit will result in trimming of the excess URLs.

POST /smart-extract

Full endpoint

https://api.simplescraper.io/v1/smart-extract

Smart Extract uses AI to accurately extract data and generate reusable CSS selectors from any website using only a list of the data points you need (data schema). Read more about this feature here: https://simplescraper.io/docs/smart-data-extract.

Request body properties

Property	Required	Description
url	Yes	The URL of the page to be scraped.
schema	Yes	A comma-separated list of properties to extract.
recipeId	No	ID of scrape recipe - if included will use cookies, URL and schema (property names) from this recipe

Example request

async function runSmartExtract(apikey, url, schema) {
  const endpoint = 'https://api.simplescraper.io/v1/smart-extract';
  
  const requestBody = {
    url: url,
    schema: schema
  };

  try {
    const response = await fetch(endpoint, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${apikey}`
      },
      body: JSON.stringify(requestBody)
    });
    const data = await response.json();
    console.log(data);
  } catch (error) {
    console.error('error:', error);
  }
}

// usage
let url = 'https://example.com/product-page';
let schema = 'product name, price, description, rating';

runSmartExtract('YOUR_API_KEY',url,schema);

Example response

json

{
  "extract_uid": "12345",
  "results_uid": "54321",
  "date_completed": "2024-01-01T00:00:00.000Z",
  "data": [
    {
      "product_name": "Amazon apples",
      "price": "USD $1,400",
      "description": "A great selection of Gala, Fuji, Honeycrisp, Golden Delicious & more",
      "rating": "4.5",
    },
  ],
  "selectors": [
    {
      "name": "product_name",
      "selector": "div.displayProduct",
      "uid": "e5de-60cd-4386-917b"
    },
    {
      "name": "price",
      "selector": "div.displayListingPrice",
      "uid": "ea74-d519-4508-b5ca"
    },
    {
      "name": "description",
      "selector": "div.description",
      "uid": "4cc4-d406-4dd1-887f"
    },
    {
      "name": "rating",
      "selector": "div.feature-item:nth-child(2)",
      "uid": "c720-a90e-44dd-9eb4"
    }
  ],
  "status": "completed"
}

Request timeouts

Requests to the /recipes/:recipeId/run endpoint will timeout after 90 seconds and a JSON object containing a results_id and status of 'running' will be returned.

The results_id can then be used to poll the /results/:resultsId endpoint for a status of 'completed'. In general, most scrape requests are completed within 90 seconds.

Error Handling

The API uses standard HTTP response codes to indicate the success or failure of an API request. In general:

Codes in the 2xx range indicate success
Codes in the 4xx range indicate an error that resulted from the provided information or the account (eg, a required parameter was missing, insufficient credits etc.)
Codes in the 5xx range indicate an error with our servers

In addition to the HTTP status code, all error responses include a JSON object in the response body with an error key containing a human-readable error message.

Error Response Format

All error responses have the following structure:

json

{
    "error": {
        "type": "api-key-not-included",
        "message": "API key was not included in the request."
    }
}

Error Codes

Error Type	HTTP Status Code	Error Message
Successful Call	200
api-key-not-included	403	API key was not included in the request.
out-of-credits	402	Credits expired.
out-of-api-reads	402	API reads expired.
results-not-found	404	Results not found. Ensure the results ID is correct and the recipe was run.
recipe-not-found	404	The recipe was not found. Please make sure it exists and that the recipe ID is correct.
user-not-found	404	User not found.
request-timeout	408	The request timed out after 5 minutes.
invalid-request	400	Invalid request format.
invalid-value	400	Invalid value provided.
rate-limit-exceeded	429	You have exceeded the rate limit.
method-not-allowed	405	This method is not allowed for this endpoint.
default	500	An unexpected error occurred.

For persistent errors or issues not covered here, please contact customer support for assistance.

Code examples

Calling multiple URLs and handling timeout/async

// function to initiate scraping for a single URL
async function runScrapeForUrl(apiKey, recipeId, sourceUrl) {
    const url = `https://api.simplescraper.io/v1/recipes/${recipeId}/run`;

    const requestBody = {
        apikey: apiKey,
        sourceUrl: sourceUrl,
        runAsync: false
    };

    try {
        const response = await fetch(url, {
            method: 'POST',
            headers: {
        		'Content-Type': 'application/json',
         		'Authorization': `Bearer ${apikey}`
      		},
            body: JSON.stringify(requestBody)
        });

        const data = await response.json();

        if (data.status === 'running') {
            return { status: 'running', resultsId: data.results_id };
        } else if (data.status === 'error') {
            return { status: 'error', error: data.error };
        } else {
            return data;
        }
    } catch (error) {
        return { status: 'error', error: error.message };
    }
}

// main function to process multiple URLs
async function main() {
    const apikey = 'your-api-key';
    const recipeId = 'your-recipe-id';
    const urls = [
        'https://example.com/page1',
        'https://example.com/page2',
        'https://example.com/page3'
    ];

    for (const url of urls) {
        let result = await runScrapeForUrl(apikey, recipeId, url);

        // if scrape is running asynchronously, poll for results
        if (result.status === 'running') {
            result = await pollForResults(result.resultsId, apiKey); // example covered later
        } else if (result.status === 'error') {
            console.error(`Error scraping ${url}:`, result.error);
        } else if (result.status === 'completed') {
            console.log(`Successfully scraped ${url}`);
            // process successful result here
        }

    }
}

main();

Polling for Results

To handle tasks that are still processing, implement a polling mechanism on the client side. Use a sensible interval of a few seconds to avoid overloading the endpoint.

// calling the v1/results/:resultsId endpoint

async function pollForResults(resultsId, apikey, maxAttempts = 10) {
  const url = `https://api.simplescraper.io/v1/results/${resultsId}`;
  
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    try {
     const response = await fetch(url, {
  	 	headers: {
      		'Authorization': `Bearer ${apikey}`
  	   		}
	  });
      const data = await response.json();

      if (data.status === 'completed') {
        console.log('Scraping completed successfully:', data);
        return data;
      }

      console.log('Job still processing, retrying in 5 seconds...');
      await new Promise(resolve => setTimeout(resolve, 5000));
    } catch (error) {
      console.error('Error polling for result:', error);
    }
  }

  console.error('Max polling attempts reached. Please check the job status manually.');
  return null;
}

// call function
async function main() {
   const result = await pollForResults('your-results-id', 'your-api-key');
   if (result) {
     console.log('Final result:', result);
   } else {
     console.log('Failed to retrieve results');
   }
 }
 main();

Handling Errors

When working with our API, we recommend checking both the HTTP status code and the presence of an error key in the response body. Here's an example of how you might handle errors in your code:

async function makeApiRequest(apikey, endpoint) {
  try {
     const response = fetch('https://api.simplescraper.io/v1/${endpoint}', {
  	 	headers: {
    		'Authorization': `Bearer ${apikey}`
  		}
	 });
    
    if (!response.ok) {
      const errorData = await response.json();
      throw new Error(errorData.error || `HTTP error status: ${response.status}`);
    }
    
    const data = await response.json();
    // process successful response
  } catch (error) {
    console.error('There was an error:', error.message);
    // handle the error
  }
}

// call function
async function main() {
  try {
    const data = await makeSimplescraperRequest('your-api-key', 'recipes/123456/run');
    console.log('API response:', data);
  } catch (error) {
    console.error('Error in main:', error.message);
  }
}

main();

Using the Simplescraper API ​

Authentication ​

Request structure ​

POST and PUT Request structure ​

GET Request structure ​

API Endpoints ​

POST /recipe/:recipeId/run ​

Request body properties ​

Example POST request ​

​

Notes: ​

Example GET request ​

Response structure ​

Example response (no timeout) ​

Example response (timeout) ​

POST /recipes ​

Request body properties ​

Example request ​

Response structure ​

Example response ​

GET /recipes ​

Query Parameters ​

Example request ​

Response structure ​

Example response ​

GET /recipes/:recipeId ​

PUT /recipes/:recipeId ​

Request body properties ​

Example request ​

Response structure ​

Example response ​

GET /recipes/:recipeId/results-latest ​

GET /results/:resultsID ​

Query Parameters ​

Example request ​

Example response ​

POST /recipes/:recipeId/batch/urls ​

Parameters ​

Example request ​

Example response ​

Notes ​

POST /smart-extract ​

Request body properties ​

Example request ​

Example response ​

Request timeouts ​

Error Handling ​

Error Response Format ​

Error Codes ​

Code examples ​

Calling multiple URLs and handling timeout/async ​

Polling for Results ​

Handling Errors ​

Using the Simplescraper API

Authentication

Request structure

POST and PUT Request structure

GET Request structure

API Endpoints

POST /recipe/:recipeId/run

Request body properties

Example POST request

Notes:

Example GET request

Response structure

Example response (no timeout)

Example response (timeout)

POST /recipes

Request body properties

Example request

Response structure

Example response

GET /recipes

Query Parameters

Example request

Response structure

Example response

GET /recipes/:recipeId

PUT /recipes/:recipeId

Request body properties

Example request

Response structure

Example response

GET /recipes/:recipeId/results-latest

GET /results/:resultsID

Query Parameters

Example request

Example response

POST /recipes/:recipeId/batch/urls

Parameters

Example request

Example response

Notes

POST /smart-extract

Request body properties

Example request

Example response

Request timeouts

Error Handling

Error Response Format

Error Codes

Code examples

Calling multiple URLs and handling timeout/async

Polling for Results

Handling Errors