Extracting Markdown data
Simplescraper enables you to easily extract a page's entire text content in Markdown format. Markdown retains the page formatting and is a preferred format when analyzing web date using AI models such as OpenAI's ChatGPT and Anthropic's Claude.
There's a few ways to extract website data in Markdown format with Simplescraper:
Via scrape recipe
When saving a scrape recipe (this guide covers saving recipes), click the Advanced options section and toggle the 'Extract Markdown' button to the on position
Run your recipe and the Markdown will appear in its own column and a 'download Markdown' button will be available
Note that if the Markdown is very large (over 10MB), the file will be downloaded as a zip
Via the API
When calling the Simplescraper API, include
extractMarkdown: true
in the body of the request- js
const apikey = 'ap1k3y'; const requestBody = { extractMarkdown: true, }; const response = fetch(`https://api.simplescraper.io/v1/recipes/${recipeId}/run`, { method: 'POST', headers: { 'Authorization': `Bearer ${apikey}`, 'Content-Type': 'application/json' }, body: JSON.stringify(requestBody) })
Please read the full API guide for more details on data extraction via the API: https://simplescraper.io/docs/api-guide
Via Auto-Crawler
- Coming soon