How to scrape paginated results in JavaScript

Updated 2026-06-25 · 5 min read

If the listing you're scraping spreads its results across numbered pages, fetching page one gives you the first 10 or 20 rows and nothing else. The rest sit behind ?page=2, ?page=3, and so on, and you do not know up front how many pages there are, so you cannot just hardcode a range and call it done.

The fix is a loop that increments the page number until a page comes back empty. We'll build a small script that walks each page URL from a template so a path-segment or query-string pager both work, fetches the HTML with a normal browser header so the server returns the real page, pulls the rows out with a CSS selector, and stops the first time a page returns no rows so the loop discovers the page count instead of guessing it, with a page cap so a misbehaving site cannot spin the loop without end. It comes out to about 35 lines of Node.js with one open-source library plus the built-in fetch.

The complete script

// scrape-paginated.mjs
import * as cheerio from 'cheerio'

// {page} is replaced with the page number on each pass.
// swap to 'https://example.com/products?page={page}' for query-string pagination.
const urlTemplate = 'https://quotes.toscrape.com/page/{page}/'
const rowSelector = '.quote .text'

// a safety cap so a site that never returns an empty page cannot loop without end.
const maxPages = 50

const results = []

for (let page = 1; page <= maxPages; page++) {
  const url = urlTemplate.replace('{page}', String(page))

  const res = await fetch(url, {
    headers: { 'User-Agent': 'Mozilla/5.0' }
  })

  // a 404 past the last page is also an end-of-data signal on some sites.
  if (res.status === 404) break
  if (!res.ok) throw new Error(`page ${page} returned HTTP ${res.status}`)

  const $ = cheerio.load(await res.text())
  const rows = $(rowSelector).map((_, el) => $(el).text().trim()).get()

  // the stop condition: an empty page means we have walked past the last page.
  if (rows.length === 0) break

  results.push(...rows)
  console.log(`page ${page}: ${rows.length} rows (running total ${results.length})`)
}

console.log(`done: ${results.length} rows across the paginated listing`)

bash

npm install cheerio
node scrape-paginated.mjs

How it works

Build the URL from a template. The page number is the only part of the URL that changes between requests, so the template carries a {page} placeholder and the loop substitutes the current number with replace. The example target paginates with a path segment (/page/2/); a site that paginates with a query string uses ?page={page} instead, and nothing else in the loop changes. A few APIs and listings treat ?page=0 as the first page, so check the first page in a browser, and if ?page=0 is it, start the loop at page = 0 and adjust the cap, since starting at 1 there silently skips the first batch of rows.

Fetch with a stock desktop User-Agent. A bare fetch() from Node sends node as its User-Agent, which some servers reject. A normal Mozilla/5.0 string gets the full page back from most sites. This is politeness, not stealth, and a site that blocks bots in earnest blocks harder than a header string. Keep the requests sequential as written: swapping the loop for Promise.all over a page range fires every request in the same instant, which many servers answer with 429 Too Many Requests, so cap concurrency and add backoff if you need the speed. See How to rate-limit requests with backoff in JavaScript.

Parse and select the rows. cheerio.load() parses the HTML into a jQuery-style document, and the CSS selector pulls out the elements you want. The .map().get() pair turns the matched nodes into a plain array of strings, one entry per row on that page. One failure mode dominates: fetch only sees the server's initial HTML, so a listing that renders rows client-side with React or Vue hands back an empty shell and the loop stops at page one, which means you find the JSON endpoint the page calls in the Network tab and paginate that directly, or render each page with Puppeteer first. See How to scrape a JavaScript-rendered page in Node.js.

Stop on an empty page. Once a fetched page yields zero rows, the listing is exhausted and the loop breaks. The maxPages cap is the backstop for sites that wrap around to page one or echo the last page instead of returning an empty one, so the loop is bounded even when the empty-page signal never arrives. Some sites clamp an out-of-range page to the last valid one, so ?page=999 returns the same rows as the final page and the empty-page check never fires; to catch that, compare each page's rows to the previous page's and break when JSON.stringify(rows) === JSON.stringify(prevRows). To avoid fetching one extra page past the end just to see zero rows, read the pager instead where one exists: on a site with a li.next > a link, stop as soon as that link is absent with if ($('li.next a').length === 0) break after collecting the current page's rows. When the listing is sorted by a value that changes while you scrape (recently updated, price), the same row can shift between pages and get collected twice, so dedupe by a stable key after the loop rather than trusting page boundaries. See How to deduplicate scraped records in JavaScript.

Use this when

The listing paginates with a page number in the URL (?page=2 or /page/2/) and the rows are present in the server-rendered HTML. This covers most server-rendered catalogs, search result pages, forum indexes, and blog archives.

Skip this when

The page loads more rows on scroll with no page number (use an infinite scroll loop); the next batch appears behind a button click (use a load-more loop); the rows are rendered client-side from an XHR call (paginate the JSON endpoint or render with Puppeteer); the pager uses a cursor token rather than a page number (carry the cursor from each response into the next request).

How to scrape paginated results in JavaScript ​

The complete script ​

How it works ​

Related guides ​

Skip the code, just get the data Simplescraper turns any website into structured data in seconds.

How to scrape paginated results in JavaScript

The complete script

How it works

Related guides

Skip the code, just get the data
Simplescraper turns any website into structured data in seconds.