How to use Smart Extract
Simplescraper Smart Extract uses AI to accurately extract data and generate reusable CSS selectors from any website. All that's required is a URL and a data schema that lists the properties you wish to extract.
Smart Extract can be accessed via:
- The Simplescraper dashboard - click on 'Get Data' in the sidebar then 'Smart Extract'
- At scrape.new
- Programatically via the API. Please see the docs here: https://simplescraper.io/docs/api-guide#post-smart-extract.
Smart Extract + CSS selectors
Smart Extract returns valid CSS selectors, making it easy to convert AI-powered extractions into regular scrape recipes. Start with AI to quickly identify and validate the data you want, then switch to standard scraping for speed, accuracy, and scale.
This combines the flexibility of AI with the reliability of traditional scraping - meaning no hallucinations or context limits.
Dashboard Usage
- Navigate to https://simplescraper.io/new
- In the top input field, enter the URL of the page you wish to extract data from
- In the bottom input field, enter a data schema (comma-seperated list of properties you wish to extract)
- Click the 'Extract Data' button
- After a few seconds, the data will be returned in CSV and JSON format
- Click the 'Save as a scrape recipe' button to convert the smart extraction into a regular scrape recipe, allowing you to scrape at scale using Simplescraper
API Usage
- API docs can be found here: https://simplescraper.io/docs/api-guide#post-smart-extract.
Tips on writing your data schema
The schema provided should be a short, accurate list of each of the visible data points on the website that you wish to extract.
- For example, if extracting data from a jobs board: 'Role, salary, location, job type, company, description, experience required' is a good schema.
Including a hint of what data is being extracted can increase accuracy.
- For example, instead of: 'name, location, price, size, bedrooms, bathrooms', including a reference to the type of data can improve results. Example: 'property name, location, price, size (sqm), bedrooms, bathrooms'.
A schema is not a prompt.
- This works: "title, old price, current price, discount, review count, description, num capsules, rating".
- This does not: "visit the website and extract everything on the page beginning with A".
Current limitations of Smart Extract
- Images URLs are not extracted (will be possible soon)
- The URL is required to be publically available and not behind a login
Examples of using Simplescraper Smart Extract
The following are a list of websites and example schemas that would return accurate data. Use similar style schema on the sites you wish to extract data from.
Website | Schema |
---|---|
https://carsandbids.com/ | car name, details, time remaining, bid price, location |
https://www.nike.com/gb/t/air-force-1-07-next-nature-shoes-67bFZC/DV3808-107 | price, old price, name, num of colors |
https://jobs.careers.microsoft.com/global/en/search | job title, location, remote possible, description |
https://www.realestate.com.au/international/id/bali/ | price aud, price us, location, size (m2) |
https://x.com/emollick | name, @tag, tagline, joined date, link, number of posts, top tweet text |
Workflow example: Using Smart Extract to create a scrape recipe
One powerful way to use Smart Extract is as a starting point for creating a regular scrape recipe. This gives you the flexibility of AI to quickly identify the data you want, and then the speed, accuracy, and reliability of standard scraping for production use.
The idea: extract data and selectors → review → save as recipe → scale.
Step 1: Extract data and selectors using Smart Extract
Use Smart Extract to extract the data you want from a page. You'll get both the structured data and a list of CSS selectors for each field.
async function runSmartExtract(apiKey, url, schema) {
const response = await fetch('https://api.simplescraper.io/v1/smart-extract', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ url, schema })
});
const data = await response.json();
return data;
}
Example usage:
const apiKey = 'YOUR_API_KEY';
const url = 'https://example.com/product-page';
const schema = 'product name, price, description, rating';
const result = await runSmartExtract(apiKey, url, schema);
Step 2: Review the results
You'll receive two important things in the response:
data
: an array of structured values (the extracted data)selectors
: a list of CSS selectors for each field you requested
If the selectors look good, you're ready to convert them into a scrape recipe.
Step 3: Create a recipe using the returned selectors
Once you're happy with the fields and selectors, use them to create a new recipe. This lets you run scalable scrapes using Simplescraper's standard scraping engine.
async function createRecipe(apiKey, name, url, selectors) {
const response = await fetch('https://api.simplescraper.io/v1/recipes', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
name: name,
url: url,
selectors: selectors
})
});
const recipe = await response.json();
console.log('Recipe created:', recipe);
return recipe;
}
Example usage (continued):
const recipeName = 'Product Page Scraper';
await createRecipe(apiKey, recipeName, url, result.selectors);
So start with Smart Extract to quickly find the right selectors, then save them as a recipe for fast, reliable scraping at scale. If the site changes later, you can run Smart Extract again and update the existing recipe using PUT /recipes/:recipeId
- no need to manually update your recipes.
Notes:
Smart Extract is in beta and may not be 100% accurate. If you encounter any issue or incorrect data please contact us via chat