How to scrape an API with pagination in JavaScript

Updated 2026-06-25 · 6 min read

If you found the JSON API behind a site and called it once, you have probably noticed the response only carries the first slice of the data: 20 or 50 items, plus a field like next_cursor, total, or has_more that hints there is more behind it. Hardcoding ?page=2, ?page=3 and so on works until the API switches scheme, returns a duplicate page, or runs past the end and you cannot tell where to stop.

The fix is to read the pagination metadata the API already returns and let it drive a loop that requests the next slice until the API says there is no more. We'll build a small script that wraps every request in one helper so timeout and error handling stay identical, then pages through whichever of the three schemes the API uses: cursor (follow the opaque next_cursor token until it goes empty), offset (advance a numeric position by the page size until a short page arrives), or page-number (read the reported total and request pages until you have it). Every loop carries a hard page cap so a malformed stop signal cannot run forever. Each scheme is a short loop on top of fetch, around 30 lines, no dependencies beyond Node 18+.

The complete script

// paginate-api.mjs

/* a small helper so every request shares timeout and JSON handling.
   AbortSignal.timeout is built into Node 18+. */
async function getJSON(url) {
  const res = await fetch(url, {
    headers: { 'Accept': 'application/json' },
    signal: AbortSignal.timeout(15000)
  })
  if (!res.ok) throw new Error(`${res.status} ${res.statusText} for ${url}`)
  return res.json()
}

/* 1. cursor pagination.
   follow the opaque token until the API stops returning one.
   GitHub, Stripe, Twitter-style APIs work this way. */
async function paginateByCursor(baseUrl, pageCap = 50) {
  const all = []
  let cursor = null
  let page = 0

  while (page < pageCap) {
    const url = cursor
      ? `${baseUrl}?limit=100&cursor=${encodeURIComponent(cursor)}`
      : `${baseUrl}?limit=100`
    const body = await getJSON(url)

    all.push(...body.items)
    page++

    /* the API hands back a fresh token while there is more data,
       and omits it (null / undefined / '') on the final page. */
    cursor = body.next_cursor
    if (!cursor) break
  }

  return all
}

/* 2. offset pagination.
   advance a numeric offset by the page size. stop on a short page:
   a page smaller than the size we asked for is the last one. */
async function paginateByOffset(baseUrl, pageSize = 50, pageCap = 200) {
  const all = []
  let offset = 0

  for (let page = 0; page < pageCap; page++) {
    const url = `${baseUrl}?limit=${pageSize}&offset=${offset}`
    const body = await getJSON(url)

    all.push(...body.items)

    /* a full page might still have more behind it, so keep going.
       a short or empty page means we have reached the end. */
    if (body.items.length < pageSize) break
    offset += pageSize
  }

  return all
}

/* 3. page-number pagination.
   read the reported total once, then request pages until we have it.
   WordPress, many REST APIs, and search endpoints work this way. */
async function paginateByPageNumber(baseUrl, perPage = 50, pageCap = 200) {
  const all = []
  const first = await getJSON(`${baseUrl}?per_page=${perPage}&page=1`)
  all.push(...first.results)

  /* total_count is the number of records across all pages.
     Math.ceil gives the page count; we already have page 1. */
  const totalPages = Math.min(
    Math.ceil(first.total_count / perPage),
    pageCap
  )

  for (let page = 2; page <= totalPages; page++) {
    const body = await getJSON(`${baseUrl}?per_page=${perPage}&page=${page}`)
    all.push(...body.results)
  }

  return all
}

/* swap in the API you are paging and call the matching function. */
const records = await paginateByCursor('https://api.example.com/v1/orders')
console.log(`Collected ${records.length} records`)

bash

node paginate-api.mjs

How it works

Share one request helper. getJSON sets Accept: application/json, wraps every call in a 15-second AbortSignal.timeout, and throws on any non-2xx status so a 429 or 500 surfaces instead of being parsed as JSON. Reuse it across all three loops so the timeout and error handling stay identical. Before wiring any loop, log the first raw response with console.log(JSON.stringify(body, null, 2).slice(0, 2000)) and read its real shape, since APIs name these fields data, next, meta.cursor, total, or nest them under pagination, and a wrong path returns undefined so the loop collects nothing. If pages fired back to back trip the API's per-minute limit, getJSON throws on the 429 and loses everything collected so far, so add a short pause between requests and honor the Retry-After header on a 429; see How to rate-limit requests with backoff in JavaScript.

Cursor: follow the token, stop when it goes empty. Read body.next_cursor after each request and pass it back as ?cursor=, URL-encoding it because cursor tokens are opaque and often contain characters like = or + that the query string would otherwise mangle. The API supplies a token while more data exists and omits it on the last page, so if (!cursor) break is the natural stop. Some APIs return the same token on the final page instead of dropping it, which would spin until the cap, so track the previous cursor and also break when it repeats. For very long tokens that some servers reject in a query string, send the cursor in a POST body instead. This is the scheme to prefer when offered, because the token stays valid even as new records arrive during the scrape.

Offset: advance by the page size, stop on a short page. Request ?limit=50&offset=0, then offset=50, offset=100, and so on. A page that returns fewer items than pageSize is the last one, which if (body.items.length < pageSize) break catches without needing a total. Offset pagination can skip or repeat records if rows are inserted or deleted mid-scrape, because the window shifts under you, so it suits stable datasets better than fast-changing feeds; if only offset is offered for live data, sort by a stable immutable key (?order_by=id&offset=...) so the window does not move.

Page-number: read the total, compute the page count. The first request returns total_count, and Math.ceil(total_count / perPage) gives how many pages cover it. The loop starts at page 2 because page 1 is already collected. If the API does not return a total, fall back to the offset stop condition: keep requesting until a page comes back shorter than perPage.

Cap every loop. pageCap bounds each function regardless of what the API reports. A next_cursor that points at itself, a total_count that disagrees with reality, or a has_more that never flips would otherwise spin forever; the cap turns that into a bounded request count you can reason about. Size it to the dataset (Math.ceil(expectedTotal / pageSize) plus a margin) rather than leaving the default, since a 60,000-record set at 100 per page needs 600 requests and the default cap of 50 or 200 would stop you early while looking like a clean finish, and log when a loop exits on the cap rather than on the API's own stop signal.

Use this when

You have located a JSON API (often by watching the Network tab) that returns data in pages and you want every record, whether it pages by cursor, offset, or page number. This is the right tool for REST and JSON endpoints that hand back structured data plus pagination metadata.

Skip this when

The data is rendered into HTML rather than served as JSON (parse the HTML with cheerio instead); the "next page" is a button that fires more requests without changing the URL (drive it with Puppeteer, see the infinite-scroll guide); the API exposes a bulk export or a single ?limit=all endpoint (use that and skip paging); or each page needs an auth token that expires mid-scrape (refresh the token inside the loop before it lapses).

How to scrape an API with pagination in JavaScript ​

The complete script ​

How it works ​

Related guides ​

Skip the code, just get the data Simplescraper turns any website into structured data in seconds.

How to scrape an API with pagination in JavaScript

The complete script

How it works

Related guides

Skip the code, just get the data
Simplescraper turns any website into structured data in seconds.