Simplescraper
Skip to content

How to handle load more buttons when scraping

How to handle load more buttons when scraping

Updated 2026-06-24 · 6 min read

If the page you're scraping only shows its first handful of results until you click a "Load more" button, a plain fetch of the HTML will only ever return that first handful. The rest of the rows load in later, a batch at a time, each time the button is clicked, so to get all of them your scraper has to do the clicking itself.

The hard part is timing: you have to know when each click has actually finished loading before you fire the next one. The solution is to drive the button in a loop with Puppeteer and wait for the network response after each click, stopping once the button is gone or stops adding rows. It takes about 70 lines of Node.js with one open-source library.

Key terms

  • XHR. The background request the page fires to fetch the next batch of items when the button is clicked, without reloading the page.
  • page.waitForResponse. A Puppeteer method that blocks until a network response matching a predicate arrives, used here to wait for the batch request to return 200.
  • page.waitForFunction. A Puppeteer method that polls the live page until a condition is true, used here to wait until the item count actually rises after the response.
  • Render tick. The short gap between the data arriving and the framework painting the new rows into the DOM, which is why the count is not trusted immediately.
  • $$eval. A Puppeteer method that runs a function over all elements matching a selector inside the page and returns the result, used to count and to read out the items.

Here is what the script does:

  • Launch headless Chrome with Puppeteer and open the list page that hides its full content behind a "Load more" button.
  • Find the button by its visible text rather than a brittle CSS class, so the loop survives a markup change.
  • Click the button and wait for the matching XHR response with page.waitForResponse, so the next iteration only starts once the new items have actually arrived.
  • Stop when the button disappears or stops adding rows, with a retry counter that tolerates a couple of no-progress clicks before giving up.
  • Read the items out of the fully expanded DOM in one pass.

The complete script

js
// load-more.mjs
import puppeteer from 'puppeteer'

const url = 'https://www.scrapingcourse.com/button-click'

const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()
await page.goto(url, { waitUntil: 'networkidle2' })

/* Find a button by its visible text. Class names churn; the label "Load more" rarely does. */
async function findButtonByText(text) {
  const handles = await page.$$('button, a[role="button"]')
  for (const handle of handles) {
    const label = await handle.evaluate(el => el.textContent.trim().toLowerCase())
    if (label.includes(text.toLowerCase())) return handle
  }
  return null
}

/* Count the items currently in the DOM so we can tell whether a click actually added any. */
const countItems = () => page.$$eval('.product-item', els => els.length)

let previousCount = await countItems()
let emptyClicks = 0
const maxEmptyClicks = 2

while (emptyClicks < maxEmptyClicks) {
  const button = await findButtonByText('load more')
  if (!button) break /* Button gone means the list is fully expanded. */

  /* Click and wait for the data response together. Starting the wait before the click
     avoids the race where the XHR resolves before the listener is attached. */
  const [response] = await Promise.all([
    page.waitForResponse(
      res => res.url().includes('/ajax') && res.status() === 200,
      { timeout: 15000 }
    ).catch(() => null),
    button.click()
  ])

  /* The DOM updates a tick after the response lands; wait for the count to actually rise. */
  await page
    .waitForFunction(
      (prev, sel) => document.querySelectorAll(sel).length > prev,
      { timeout: 5000 },
      previousCount,
      '.product-item'
    )
    .catch(() => null)

  const currentCount = await countItems()
  if (currentCount > previousCount) {
    previousCount = currentCount
    emptyClicks = 0 /* Progress made, reset the guard. */
  } else {
    emptyClicks++ /* No new rows. Could be a slow response, so retry before giving up. */
  }
}

const items = await page.$$eval('.product-item', els =>
  els.map(el => ({
    name: el.querySelector('.product-name')?.textContent.trim() ?? null,
    price: el.querySelector('.product-price')?.textContent.trim() ?? null
  }))
)

console.log(`Loaded ${items.length} items`)
console.log(items.slice(0, 5))

await browser.close()
bash
npm install puppeteer
node load-more.mjs

What each step does

Open the page with networkidle2. Many list pages load their first batch over XHR after the initial HTML. Waiting for the network to settle means the first items and the button are present before the loop starts, so the first findButtonByText call does not return null on a page that is still booting.

Match the button by text, not by class. A selector like .btn-load-more-v2 ties your scraper to one build of the site's CSS. Reading textContent and matching "load more" survives a class rename and works across the many sites that label this button the same way. The helper checks both <button> and <a role="button"> because plenty of sites style a link as the trigger.

Wait on the response and the click in one Promise.all. The listener has to be attached before the click fires, otherwise a quick server answers before waitForResponse is listening and the wait hangs until the timeout. Putting both in the same Promise.all attaches the listener first, then clicks. The .catch(() => null) keeps a missed response from throwing, since the count check is the real source of truth.

Confirm the DOM grew before trusting the click. A 200 response does not guarantee the rows are painted. page.waitForFunction polls the live page until querySelectorAll(sel).length exceeds the previous count, which bridges the gap between the response landing and the framework rendering the new items.

Use a retry guard, not a single failed click, to exit. One click that adds nothing might be a response that came back slow. The emptyClicks counter allows two no-progress clicks before breaking, so a slow batch does not end the run with half the list. Reset it to zero every time the count rises.

Gotchas

  • The wait is attached after the click, so it hangs.

    • Issue: writing await button.click() on its own line and then await page.waitForResponse(...) lets a fast server respond in the gap between the two lines, so the listener never sees the response and the wait runs to its full timeout on every iteration.
    • Fix: put the click and the wait in the same Promise.all([...]) so the listener is registered before the click is dispatched.
  • The button is matched by a class that changes between deploys.

    • Issue: page.click('.load-more-btn') throws as soon as the site ships a CSS refactor and renames the class, and the same class is often reused on unrelated buttons.
    • Fix: match on visible text with a small helper that reads textContent, which is stable across markup changes and portable between sites.
  • The DOM count is read before the new rows render.

    • Issue: the XHR returns 200 but the framework has not painted the rows yet, so countItems() returns the old number and the loop exits one batch early thinking it is done.
    • Fix: add page.waitForFunction after the response to block until querySelectorAll(sel).length actually exceeds the previous count.
  • The button stays in the DOM but is disabled or hidden at the end.

    • Issue: some sites leave the button in place and set disabled or display:none when the list is exhausted. A disabled button clicks as a no-op, and a display:none button makes button.click() throw "Node is not visible", so findButtonByText keeps handing back a handle that either stalls the loop or crashes it.
    • Fix: before clicking, skip a button that is not actionable. Drop it when el.disabled is true or el.offsetParent is null, and treat that as the end of the list. The emptyClicks guard is the backstop for a button that stays clickable but stops adding rows.
  • waitForResponse matches the wrong request on a chatty page.

    • Issue: a loose predicate like res => res.status() === 200 resolves on the first analytics ping or image, not the data call, so the loop proceeds before the items have loaded.
    • Fix: open the Network tab, find the request the button actually fires, and match its real path, for example res.url().includes('/ajax') or a specific query string.
  • The page lazy-loads images, so scraped src values are blank.

    • Issue: images inside the newly loaded rows use data-src and a blank or placeholder src until they scroll into view, so reading img.src returns the placeholder.
    • Fix: read the data-src attribute instead, or scroll each new batch into view before extracting. See How to scrape lazy-loaded images.
  • The list runs to thousands of rows and the run never ends.

    • Issue: a feed that loads forever turns the while loop into an effectively infinite scrape that fills memory with DOM nodes.
    • Fix: add a maxClicks ceiling or a target count to the loop condition, and break once you have enough rows for the job.

Use this when

A list, search result, or catalog page hides most of its content behind a button you have to click, the button fires an XHR for each new batch, and you want every item in one pass.

Skip this when

The rows load automatically as you scroll instead of on a click (use an infinite-scroll loop); the site exposes the same data through a paginated API or a ?page=N URL (fetch the JSON directly, which is lighter than driving a browser); the full list is already in the initial HTML (a plain fetch and a parser are enough); or you only need the first visible batch (no clicking required).

Skip the code, just get the data

Simplescraper turns any website into structured data in seconds.