Preventing duplicates by using a unique results key

Video showing how to set a property as a unique key.

Please note that this guide is likely not necessary if you're using the crawler or scraping a webpage only occasionally. Avoiding duplicates is helpful when repeatedly and frequently scraping data from a single page that updates over time.

If you are scraping a single page repeatedly such that it may contain both data that has already been scraped as well new data that has yet to be scraped, you may find it helpful to set a unique key to avoid scraping the same data each time the recipes runs.

A unique key is a value that is associated to a particular selection of data and is not repeated anywhere else on a page. Some common examples are a URL link, an address, or a product ID. By selecting a unique key you ensure that each selection of data is only saved once.

To set a unique key:

  • create a recipe that includes at least one property that is unique. For example if you're scraping a list of links from a job board then including a property that scrapes the link would be an ideal option

  • when saving or editing the recipe, click 'Show advanced options' and in the Unique results key (previously called 'Results UID key') section select the property that you created from the dropdown field

  • save the scrape recipe

Now anytime the scrape recipes runs, any data that has already been scraped will be ignored and only new data will be saved to Simplescraper.

For questions or help please reach out via chat.