Skip to content

Extracting Markdown data

Simplescraper enables you to easily extract a page's entire text content in Markdown format. Markdown retains the page formatting and is a preferred format when analyzing web date using AI models such as OpenAI's ChatGPT and Anthropic's Claude.

There's a few ways to extract website data in Markdown format with Simplescraper:

Via scrape recipe

  • When saving a scrape recipe (this guide covers saving recipes), click the Advanced options section and toggle the 'Extract Markdown' button to the on position

    • Extract Markdown
  • Run your recipe and the Markdown will appear in its own column and a 'download Markdown' button will be available

    • Preview Markdown
  • Note that if the Markdown is very large (over 10MB), the file will be downloaded as a zip

Via the API

  • When calling the Simplescraper API, include extractMarkdown: true in the body of the request

    • js
      const apikey = 'ap1k3y';
      
      const requestBody = {
        extractMarkdown: true,
      };
      
      const response = fetch(`https://api.simplescraper.io/v1/recipes/${recipeId}/run`, {
        method: 'POST',
        headers: {
            'Authorization': `Bearer ${apikey}`,
            'Content-Type': 'application/json'
        },
        body: JSON.stringify(requestBody)
      })
  • Please read the full API guide for more details on data extraction via the API: https://simplescraper.io/docs/api-guide

Via Auto-Crawler

  • Coming soon