How to scrape an infinite scroll page in Puppeteer

Updated 2026-06-25 · 6 min read

If you're scraping a feed that keeps loading more rows as you scroll, a plain fetch or a single page.content() only ever gives you the first screen of items. The rest are not in the initial HTML at all. They arrive batch by batch over background requests as the viewport reaches the bottom, so until something does the scrolling, most of the list stays out of reach.

The solution is to drive the scrolling yourself inside a page.evaluate loop, which runs your code in the live page where window.scrollTo and the page's scrollHeight exist. We'll build a small script that launches headless Chrome and opens the feed, scrolls to the bottom and pauses after each jump so the next batch can load, compares the page height before and after each scroll so it knows when the feed is exhausted, and reads every item out of the fully expanded DOM in one pass. It takes about 55 lines of Node.js with one open-source library.

The complete script

// infinite-scroll.mjs
import puppeteer from 'puppeteer'

const url = 'https://www.scrapingcourse.com/infinite-scrolling'

const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()
await page.goto(url, { waitUntil: 'networkidle2' })

/* scroll to the bottom repeatedly until the page stops getting taller.
   scrollHeight grows each time a new batch is appended; when it holds
   steady across `maxStaleScrolls` tries, the feed is exhausted. */
async function scrollToBottom({ settleMs = 1500, maxScrolls = 100, maxStaleScrolls = 2 } = {}) {
  let previousHeight = 0
  let staleScrolls = 0
  let scrolls = 0

  while (staleScrolls < maxStaleScrolls && scrolls < maxScrolls) {
    /* jump to the bottom and read the height from inside the page. */
    const currentHeight = await page.evaluate(() => {
      window.scrollTo(0, document.body.scrollHeight)
      return document.body.scrollHeight
    })

    /* give the background request time to return and the rows time to render. */
    await new Promise(resolve => setTimeout(resolve, settleMs))

    const newHeight = await page.evaluate(() => document.body.scrollHeight)
    if (newHeight > currentHeight || currentHeight > previousHeight) {
      staleScrolls = 0 /* the page grew, so there is probably more to load. */
    } else {
      staleScrolls++ /* no growth. could be a slow batch, so retry before stopping. */
    }

    previousHeight = newHeight
    scrolls++
  }
}

await scrollToBottom()

/* the full list is in the DOM now. read every card in one pass. */
const items = await page.$$eval('.product-item', els =>
  els.map(el => ({
    name: el.querySelector('.product-name')?.textContent.trim() ?? null,
    price: el.querySelector('.product-price')?.textContent.trim() ?? null
  }))
)

console.log(`Loaded ${items.length} items`)
console.log(items.slice(0, 5))

await browser.close()

bash

npm install puppeteer
node infinite-scroll.mjs

How it works

Open the page with networkidle2. Infinite-scroll feeds usually fetch their first batch over a background request after the initial HTML lands. Waiting for the network to settle means the first items are present before the loop starts, so the height measured on the first pass is the real starting height and not the height of an empty shell. A page with a persistent socket, a polling request, or an autoplaying video never reaches two idle connections, so networkidle2 waits to the navigation timeout and throws; for those, switch to waitUntil: 'domcontentloaded' and then await page.waitForSelector('.product-item') so you proceed once the first items exist.

Scroll from inside page.evaluate. The scroll has to happen in the page's own context, because window.scrollTo and document.body.scrollHeight only exist there. Returning scrollHeight from the same call gives you the height the moment after the jump, before the new batch has had time to load, which is the number you compare against later. Two layouts break the hard jump to the bottom: some feeds put the scrollbar on an inner <div> with overflow: auto, where window.scrollTo moves nothing, so set that element's scrollTop to its scrollHeight and measure that element instead of the body; and a feed that loads off an IntersectionObserver sentinel can fire a screen early, so a leap to scrollHeight skips past the trigger without firing it, which you fix by scrolling in steps with window.scrollBy(0, window.innerHeight) and a short pause between steps so you cross every trigger point on the way down.

Pause for a settle delay after each scroll. The batch request fires when the viewport reaches the bottom, then the framework appends the rows a moment later. The settleMs wait covers both. Set it too low and you measure the height before the new rows render and exit early; 1500ms is a reasonable starting point and you can tune it per site, or replace the fixed wait with page.waitForResponse keyed to the batch request's URL so the wait ends exactly when the data lands instead of after a guess.

Stop on a stale-height count, not the first non-growth. A single scroll that adds no height might just be a slow batch. The staleScrolls counter tolerates two no-growth scrolls before breaking, so a slow response does not end the run with half the feed. The maxScrolls ceiling is the backstop for a feed that never stops loading and would otherwise fill memory with DOM nodes; lower it to cover the rows you need, or break once items.length passes a target you set for the job.

Read the items after the loop, not during it. Once scrolling stops, the whole list is in the DOM, so a single $$eval pass over .product-item collects every card. Reading once at the end is simpler than collecting on each scroll and deduplicating, since the appended rows stay in the document. Rows that scrolled by fast may keep a placeholder src and a real data-src, so read the data-src attribute for any image the viewport passed too quickly to swap, or scroll in smaller steps so each batch sits in view long enough. See How to scrape lazy-loaded images.

Use this when

A feed, search result, or catalog appends more items as you scroll toward the bottom, the new rows are not in the initial HTML, and you want the whole list in one pass.

Skip this when

The page loads more behind a button you click instead of on scroll (drive the button in a loop, see How to handle load more buttons when scraping); the same data is available through a paginated API or a ?page=N URL (fetch the JSON directly, which is lighter than driving a browser); the full list is already in the first HTML response (a plain fetch and a parser cover it); or you only need the items above the fold (no scrolling required).

How to scrape an infinite scroll page in Puppeteer ​

The complete script ​

How it works ​

Related guides ​

Skip the code, just get the data Simplescraper turns any website into structured data in seconds.

How to scrape an infinite scroll page in Puppeteer

The complete script

How it works

Related guides

Skip the code, just get the data
Simplescraper turns any website into structured data in seconds.