How to handle load more buttons when scraping
Updated 2026-06-24 · 6 min read
If the page you're scraping only shows its first handful of results until you click a "Load more" button, a plain fetch of the HTML will only ever return that first handful. The rest of the rows load in later, a batch at a time, each time the button is clicked, so to get all of them your scraper has to do the clicking itself.
The hard part is timing: you have to know when each click has actually finished loading before you fire the next one. The solution is to drive the button in a loop with Puppeteer and wait for the network response after each click, stopping once the button is gone or stops adding rows. It takes about 70 lines of Node.js with one open-source library.
Key terms
- XHR. The background request the page fires to fetch the next batch of items when the button is clicked, without reloading the page.
page.waitForResponse. A Puppeteer method that blocks until a network response matching a predicate arrives, used here to wait for the batch request to return 200.page.waitForFunction. A Puppeteer method that polls the live page until a condition is true, used here to wait until the item count actually rises after the response.- Render tick. The short gap between the data arriving and the framework painting the new rows into the DOM, which is why the count is not trusted immediately.
$$eval. A Puppeteer method that runs a function over all elements matching a selector inside the page and returns the result, used to count and to read out the items.
Here is what the script does:
- Launch headless Chrome with Puppeteer and open the list page that hides its full content behind a "Load more" button.
- Find the button by its visible text rather than a brittle CSS class, so the loop survives a markup change.
- Click the button and wait for the matching XHR response with
page.waitForResponse, so the next iteration only starts once the new items have actually arrived. - Stop when the button disappears or stops adding rows, with a retry counter that tolerates a couple of no-progress clicks before giving up.
- Read the items out of the fully expanded DOM in one pass.
The complete script
// load-more.mjs
import puppeteer from 'puppeteer'
const url = 'https://www.scrapingcourse.com/button-click'
const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()
await page.goto(url, { waitUntil: 'networkidle2' })
/* Find a button by its visible text. Class names churn; the label "Load more" rarely does. */
async function findButtonByText(text) {
const handles = await page.$$('button, a[role="button"]')
for (const handle of handles) {
const label = await handle.evaluate(el => el.textContent.trim().toLowerCase())
if (label.includes(text.toLowerCase())) return handle
}
return null
}
/* Count the items currently in the DOM so we can tell whether a click actually added any. */
const countItems = () => page.$$eval('.product-item', els => els.length)
let previousCount = await countItems()
let emptyClicks = 0
const maxEmptyClicks = 2
while (emptyClicks < maxEmptyClicks) {
const button = await findButtonByText('load more')
if (!button) break /* Button gone means the list is fully expanded. */
/* Click and wait for the data response together. Starting the wait before the click
avoids the race where the XHR resolves before the listener is attached. */
const [response] = await Promise.all([
page.waitForResponse(
res => res.url().includes('/ajax') && res.status() === 200,
{ timeout: 15000 }
).catch(() => null),
button.click()
])
/* The DOM updates a tick after the response lands; wait for the count to actually rise. */
await page
.waitForFunction(
(prev, sel) => document.querySelectorAll(sel).length > prev,
{ timeout: 5000 },
previousCount,
'.product-item'
)
.catch(() => null)
const currentCount = await countItems()
if (currentCount > previousCount) {
previousCount = currentCount
emptyClicks = 0 /* Progress made, reset the guard. */
} else {
emptyClicks++ /* No new rows. Could be a slow response, so retry before giving up. */
}
}
const items = await page.$$eval('.product-item', els =>
els.map(el => ({
name: el.querySelector('.product-name')?.textContent.trim() ?? null,
price: el.querySelector('.product-price')?.textContent.trim() ?? null
}))
)
console.log(`Loaded ${items.length} items`)
console.log(items.slice(0, 5))
await browser.close()npm install puppeteer
node load-more.mjsWhat each step does
Open the page with networkidle2. Many list pages load their first batch over XHR after the initial HTML. Waiting for the network to settle means the first items and the button are present before the loop starts, so the first findButtonByText call does not return null on a page that is still booting.
Match the button by text, not by class. A selector like .btn-load-more-v2 ties your scraper to one build of the site's CSS. Reading textContent and matching "load more" survives a class rename and works across the many sites that label this button the same way. The helper checks both <button> and <a role="button"> because plenty of sites style a link as the trigger.
Wait on the response and the click in one Promise.all. The listener has to be attached before the click fires, otherwise a quick server answers before waitForResponse is listening and the wait hangs until the timeout. Putting both in the same Promise.all attaches the listener first, then clicks. The .catch(() => null) keeps a missed response from throwing, since the count check is the real source of truth.
Confirm the DOM grew before trusting the click. A 200 response does not guarantee the rows are painted. page.waitForFunction polls the live page until querySelectorAll(sel).length exceeds the previous count, which bridges the gap between the response landing and the framework rendering the new items.
Use a retry guard, not a single failed click, to exit. One click that adds nothing might be a response that came back slow. The emptyClicks counter allows two no-progress clicks before breaking, so a slow batch does not end the run with half the list. Reset it to zero every time the count rises.
Gotchas
The wait is attached after the click, so it hangs.
- Issue: writing
await button.click()on its own line and thenawait page.waitForResponse(...)lets a fast server respond in the gap between the two lines, so the listener never sees the response and the wait runs to its full timeout on every iteration. - Fix: put the click and the wait in the same
Promise.all([...])so the listener is registered before the click is dispatched.
- Issue: writing
The button is matched by a class that changes between deploys.
- Issue:
page.click('.load-more-btn')throws as soon as the site ships a CSS refactor and renames the class, and the same class is often reused on unrelated buttons. - Fix: match on visible text with a small helper that reads
textContent, which is stable across markup changes and portable between sites.
- Issue:
The DOM count is read before the new rows render.
- Issue: the XHR returns 200 but the framework has not painted the rows yet, so
countItems()returns the old number and the loop exits one batch early thinking it is done. - Fix: add
page.waitForFunctionafter the response to block untilquerySelectorAll(sel).lengthactually exceeds the previous count.
- Issue: the XHR returns 200 but the framework has not painted the rows yet, so
The button stays in the DOM but is disabled or hidden at the end.
- Issue: some sites leave the button in place and set
disabledordisplay:nonewhen the list is exhausted. A disabled button clicks as a no-op, and adisplay:nonebutton makesbutton.click()throw "Node is not visible", sofindButtonByTextkeeps handing back a handle that either stalls the loop or crashes it. - Fix: before clicking, skip a button that is not actionable. Drop it when
el.disabledis true orel.offsetParentisnull, and treat that as the end of the list. TheemptyClicksguard is the backstop for a button that stays clickable but stops adding rows.
- Issue: some sites leave the button in place and set
waitForResponsematches the wrong request on a chatty page.- Issue: a loose predicate like
res => res.status() === 200resolves on the first analytics ping or image, not the data call, so the loop proceeds before the items have loaded. - Fix: open the Network tab, find the request the button actually fires, and match its real path, for example
res.url().includes('/ajax')or a specific query string.
- Issue: a loose predicate like
The page lazy-loads images, so scraped
srcvalues are blank.- Issue: images inside the newly loaded rows use
data-srcand a blank or placeholdersrcuntil they scroll into view, so readingimg.srcreturns the placeholder. - Fix: read the
data-srcattribute instead, or scroll each new batch into view before extracting. See How to scrape lazy-loaded images.
- Issue: images inside the newly loaded rows use
The list runs to thousands of rows and the run never ends.
- Issue: a feed that loads forever turns the
whileloop into an effectively infinite scrape that fills memory with DOM nodes. - Fix: add a
maxClicksceiling or a target count to the loop condition, and break once you have enough rows for the job.
- Issue: a feed that loads forever turns the
Use this when
A list, search result, or catalog page hides most of its content behind a button you have to click, the button fires an XHR for each new batch, and you want every item in one pass.
Skip this when
The rows load automatically as you scroll instead of on a click (use an infinite-scroll loop); the site exposes the same data through a paginated API or a ?page=N URL (fetch the JSON directly, which is lighter than driving a browser); the full list is already in the initial HTML (a plain fetch and a parser are enough); or you only need the first visible batch (no clicking required).