Multiple page and infinite scroll scraping

You can scrape multiple pages and infinite scroll pages with Simplescraper. Before we dig into how, two important things to note:

  1. The crawler may be a better option: if the url of the pages that you want to scrape have a structured pattern like 'page=1' or 'page/2' etc. it would be quicker to scrape by pasting the url of each page into the crawler. See Crawling lists of URLs for instructions on how to use the crawler
  2. Pagination and infinite scroll should not be selected at the same time. If the page loads more content as you scroll down without any clicking required, use infinite scroll only. If a click of any kind is required, use pagination only

Pagination

To extract data across multiple pages, you must teach Simplescraper how to navigate:

  • Identify the element used to navigate to the next page (typically a 'next' button or an arrow), then click the pagination icon in the top menu to activate detection and then click on the navigation element. If done successfully you will see that the pagination icon has turned green. The css selector of the navigation element will also be visible when you hover over the pagination icon.
  • On some pages, selecting the navigation element will unintentionally load the next page. In this case press the SHIFT key when hovered over the navigation element instead of clicking to select it
  • When saving the recipe you will be presented with an option asking how many pages you wish to navigate.

Infinite scroll

  • If the page that you intend to scrape loads new content automatically as you scroll down and you wish to scrape this data, click the infinite scroll icon located in the extension's top menu
  • The icon will turn green to indicate that it's active
  • When saving the recipe you will be presented with an option asking how many times you wish the automated recipe to scroll