Skip to content

How to use Smart Extract

Simplescraper Smart Extract uses AI to accurately extract data and generate reusable CSS selectors from any website. All that's required is a URL and a data schema that lists the properties you wish to extract.

Smart Extract can be accessed via:


Smart Extract + CSS selectors

Smart Extract returns valid CSS selectors, making it easy to convert AI-powered extractions into regular scrape recipes. Start with AI to quickly identify and validate the data you want, then switch to standard scraping for speed, accuracy, and scale.

This combines the flexibility of AI with the reliability of traditional scraping - meaning no hallucinations or context limits.


Dashboard Usage

  • Navigate to https://simplescraper.io/new
  • In the top input field, enter the URL of the page you wish to extract data from
  • In the bottom input field, enter a data schema (comma-seperated list of properties you wish to extract)
  • Click the 'Extract Data' button
  • After a few seconds, the data will be returned in CSV and JSON format
  • Click the 'Save as a scrape recipe' button to convert the smart extraction into a regular scrape recipe, allowing you to scrape at scale using Simplescraper

API Usage


Tips on writing your data schema

  • The schema provided should be a short, accurate list of each of the visible data points on the website that you wish to extract.

    • For example, if extracting data from a jobs board: 'Role, salary, location, job type, company, description, experience required' is a good schema.
  • Including a hint of what data is being extracted can increase accuracy.

    • For example, instead of: 'name, location, price, size, bedrooms, bathrooms', including a reference to the type of data can improve results. Example: 'property name, location, price, size (sqm), bedrooms, bathrooms'.
  • A schema is not a prompt.

    • This works: "title, old price, current price, discount, review count, description, num capsules, rating".
    • This does not: "visit the website and extract everything on the page beginning with A".

Current limitations of Smart Extract

  • Images URLs are not extracted (will be possible soon)
  • The URL is required to be publically available and not behind a login


Examples of using Simplescraper Smart Extract

The following are a list of websites and example schemas that would return accurate data. Use similar style schema on the sites you wish to extract data from.

WebsiteSchema
https://carsandbids.com/car name, details, time remaining, bid price, location
https://www.nike.com/gb/t/air-force-1-07-next-nature-shoes-67bFZC/DV3808-107price, old price, name, num of colors
https://jobs.careers.microsoft.com/global/en/searchjob title, location, remote possible, description
https://www.realestate.com.au/international/id/bali/price aud, price us, location, size (m2)
https://x.com/emollickname, @tag, tagline, joined date, link, number of posts, top tweet text


Workflow example: Using Smart Extract to create a scrape recipe

One powerful way to use Smart Extract is as a starting point for creating a regular scrape recipe. This gives you the flexibility of AI to quickly identify the data you want, and then the speed, accuracy, and reliability of standard scraping for production use.

The idea: extract data and selectors → review → save as recipe → scale.


Step 1: Extract data and selectors using Smart Extract

Use Smart Extract to extract the data you want from a page. You'll get both the structured data and a list of CSS selectors for each field.

js
async function runSmartExtract(apiKey, url, schema) {
  const response = await fetch('https://api.simplescraper.io/v1/smart-extract', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiKey}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ url, schema })
  });

  const data = await response.json();
  return data;
}

Example usage:

js
const apiKey = 'YOUR_API_KEY';
const url = 'https://example.com/product-page';
const schema = 'product name, price, description, rating';

const result = await runSmartExtract(apiKey, url, schema);

Step 2: Review the results

You'll receive two important things in the response:

  • data: an array of structured values (the extracted data)
  • selectors: a list of CSS selectors for each field you requested

If the selectors look good, you're ready to convert them into a scrape recipe.


Step 3: Create a recipe using the returned selectors

Once you're happy with the fields and selectors, use them to create a new recipe. This lets you run scalable scrapes using Simplescraper's standard scraping engine.

js
async function createRecipe(apiKey, name, url, selectors) {
  const response = await fetch('https://api.simplescraper.io/v1/recipes', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiKey}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      name: name,
      url: url,
      selectors: selectors
    })
  });

  const recipe = await response.json();
  console.log('Recipe created:', recipe);
  return recipe;
}

Example usage (continued):

js
const recipeName = 'Product Page Scraper';
await createRecipe(apiKey, recipeName, url, result.selectors);

So start with Smart Extract to quickly find the right selectors, then save them as a recipe for fast, reliable scraping at scale. If the site changes later, you can run Smart Extract again and update the existing recipe using PUT /recipes/:recipeId - no need to manually update your recipes.


Notes:

Smart Extract is in beta and may not be 100% accurate. If you encounter any issue or incorrect data please contact us via chat