How to handle load more buttons when scraping

Updated 2026-06-24 · 6 min read

If the page you're scraping only shows its first handful of results until you click a "Load more" button, a plain fetch of the HTML will only ever return that first handful. The rest of the rows load in later, a batch at a time, each time the button is clicked, so to get all of them your scraper has to do the clicking itself.

The hard part is timing: you have to know when each click has actually finished loading before you fire the next one. The solution is to drive the button in a loop with Puppeteer and wait for the network response after each click. We'll build a small script that opens the list page in headless Chrome and finds the button by its visible text so the loop survives a CSS rename, clicks it and waits for the matching background request to return before starting the next iteration so we never guess at the timing, stops once the button is gone or a couple of clicks add no new rows, and reads every item out of the fully expanded page in one pass. It takes about 70 lines of Node.js with one open-source library.

The complete script

// load-more.mjs
import puppeteer from 'puppeteer'

const url = 'https://www.scrapingcourse.com/button-click'

const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()
await page.goto(url, { waitUntil: 'networkidle2' })

/* find a button by its visible text. class names churn; the label "Load more" rarely does. */
async function findButtonByText(text) {
  const handles = await page.$$('button, a[role="button"]')
  for (const handle of handles) {
    const label = await handle.evaluate(el => el.textContent.trim().toLowerCase())
    if (label.includes(text.toLowerCase())) return handle
  }
  return null
}

/* count the items currently in the DOM so we can tell whether a click actually added any. */
const countItems = () => page.$$eval('.product-item', els => els.length)

let previousCount = await countItems()
let emptyClicks = 0
const maxEmptyClicks = 2

while (emptyClicks < maxEmptyClicks) {
  const button = await findButtonByText('load more')
  if (!button) break /* button gone means the list is fully expanded. */

  /* click and wait for the data response together. starting the wait before the click
     avoids the race where the XHR resolves before the listener is attached. */
  const [response] = await Promise.all([
    page.waitForResponse(
      res => res.url().includes('/ajax') && res.status() === 200,
      { timeout: 15000 }
    ).catch(() => null),
    button.click()
  ])

  /* the DOM updates a tick after the response lands; wait for the count to actually rise. */
  await page
    .waitForFunction(
      (prev, sel) => document.querySelectorAll(sel).length > prev,
      { timeout: 5000 },
      previousCount,
      '.product-item'
    )
    .catch(() => null)

  const currentCount = await countItems()
  if (currentCount > previousCount) {
    previousCount = currentCount
    emptyClicks = 0 /* progress made, reset the guard. */
  } else {
    emptyClicks++ /* no new rows. could be a slow response, so retry before giving up. */
  }
}

const items = await page.$$eval('.product-item', els =>
  els.map(el => ({
    name: el.querySelector('.product-name')?.textContent.trim() ?? null,
    price: el.querySelector('.product-price')?.textContent.trim() ?? null
  }))
)

console.log(`Loaded ${items.length} items`)
console.log(items.slice(0, 5))

await browser.close()

bash

npm install puppeteer
node load-more.mjs

How it works

Open the page with networkidle2. Many list pages load their first batch over XHR, the background request a page fires to fetch data without reloading, after the initial HTML. Waiting for the network to settle means the first items and the button are present before the loop starts, so the first findButtonByText call does not return null on a page that is still booting.

Match the button by text, not by class. A selector like .btn-load-more-v2 ties your scraper to one build of the site's CSS. Reading textContent and matching "load more" survives a class rename and works across the many sites that label this button the same way. The helper checks both <button> and <a role="button"> because plenty of sites style a link as the trigger. Some sites also leave the button in place but set disabled or display:none once the list is exhausted, where a disabled button clicks as a no-op and a hidden one makes button.click() throw "Node is not visible", so skip a button whose el.disabled is true or el.offsetParent is null and treat that as the end of the list.

Wait on the response and the click in one Promise.all. The listener has to be attached before the click fires, otherwise a quick server answers before waitForResponse is listening and the wait hangs until the timeout. Putting both in the same Promise.all attaches the listener first, then clicks. The .catch(() => null) keeps a missed response from throwing, since the count check is the real source of truth. Keep the predicate tight, because a loose one like res => res.status() === 200 resolves on the first analytics ping or image rather than the data call: open the Network tab, find the request the button actually fires, and match its real path, as res.url().includes('/ajax') does here.

Confirm the DOM grew before trusting the click. A 200 response does not guarantee the rows are painted; there is a short gap between the data arriving and the framework rendering it. page.waitForFunction polls the live page until querySelectorAll(sel).length exceeds the previous count, which bridges that gap.

Use a retry guard, not a single failed click, to exit. One click that adds nothing might be a response that came back slow. The emptyClicks counter allows two no-progress clicks before breaking, so a slow batch does not end the run with half the list. Reset it to zero every time the count rises. On a feed that loads forever this while loop becomes an effectively infinite scrape that fills memory with DOM nodes, so add a maxClicks ceiling or a target count to the loop condition and break once you have enough rows. One last trap on the readout: rows that lazy-load images carry a placeholder src until they scroll into view, so read the data-src attribute or scroll each batch into view before extracting.

Use this when

A list, search result, or catalog page hides most of its content behind a button you have to click, the button fires an XHR for each new batch, and you want every item in one pass.

Skip this when

The rows load automatically as you scroll instead of on a click (use an infinite-scroll loop); the site exposes the same data through a paginated API or a ?page=N URL (fetch the JSON directly, which is lighter than driving a browser); the full list is already in the initial HTML (a plain fetch and a parser are enough); or you only need the first visible batch (no clicking required).

How to handle load more buttons when scraping ​

The complete script ​

How it works ​

Related guides ​

Skip the code, just get the data Simplescraper turns any website into structured data in seconds.

How to handle load more buttons when scraping

The complete script

How it works

Related guides

Skip the code, just get the data
Simplescraper turns any website into structured data in seconds.