Simplescraper
Skip to content

How to rate-limit requests with backoff in JavaScript

How to rate-limit requests with backoff in JavaScript

Updated 2026-06-24 · 6 min read

If you're getting rate limited, you're probably already seeing 429 responses come back, or watching a scraper that ran fine yesterday get throttled today. It is one of the first walls you hit once you send more than a handful of requests to the same server: go too fast and it starts refusing you, and once it has flagged you it can keep refusing for a while after you slow down.

The mechanism is to hold requests under the rate limit before fetch, then back off only on 429 and transient 5xx responses. It takes about 70 lines of plain Node.js, with nothing to install.

Key terms

  • Token bucket. A rate limiter that holds up to capacity tokens and refills at a fixed rate, so each request spends one token and bursts past the refill rate are made to wait.
  • Exponential backoff. A retry strategy that doubles the wait after each failed attempt, up to a ceiling, so a struggling server gets progressively more breathing room.
  • Full jitter. Replacing each backoff delay with a random value between zero and the computed delay, so clients that failed together do not all retry at the same instant.
  • Thundering herd. The collision that happens when many clients back off and retry in lockstep, hitting the server in a synchronized wave that trips the limit again.
  • Retry-After. An HTTP response header, in seconds or as a date, that tells the client how long to wait before retrying.

Here is what the script does:

  • Refill a token bucket on a timer so the limiter allows a steady number of requests per second and makes callers wait for a token instead of firing them all at once.
  • Wrap every request in an acquire() call that waits for a free token, so the rate cap is enforced at the source rather than after the server complains.
  • Retry a request that fails with 429 or a 5xx, doubling the wait each attempt up to a ceiling.
  • Add full jitter to every backoff delay so a batch of clients that all tripped the limit at the same moment does not retry in lockstep.
  • Use a numeric Retry-After header (a count of seconds) when the server sends one, instead of guessing the delay.

The complete script

js
// rate-limit-backoff.mjs

/* A token bucket. The bucket holds up to `capacity` tokens and refills
   `refillPerSecond` of them every second. Each request spends one token.
   When the bucket is empty, acquire() waits until the next refill. */
function createRateLimiter({ capacity, refillPerSecond }) {
  let tokens = capacity
  const refillIntervalMs = 1000 / refillPerSecond

  /* Top the bucket up one token at a time on a steady interval. unref() lets
     the process exit even while this timer is technically still pending. */
  setInterval(() => {
    tokens = Math.min(capacity, tokens + 1)
  }, refillIntervalMs).unref()

  async function acquire() {
    /* Poll for a free token. The 25ms sleep is acceptable for small local
       batches; use a queued limiter for larger or high-concurrency scripts. */
    while (tokens < 1) {
      await new Promise(resolve => setTimeout(resolve, 25))
    }
    tokens -= 1
  }

  return { acquire }
}

const sleep = ms => new Promise(resolve => setTimeout(resolve, ms))

/* Full jitter: a random delay in [0, cap]. This spreads a thundering herd of
   clients that all backed off at the same instant across the whole window
   instead of bunching them at the same retry time. */
function fullJitterDelay(attempt, baseMs, maxMs) {
  const exponential = Math.min(maxMs, baseMs * 2 ** attempt)
  return Math.random() * exponential
}

const limiter = createRateLimiter({ capacity: 5, refillPerSecond: 5 })

async function rateLimitedFetch(url, { maxAttempts = 5, baseMs = 500, maxMs = 20000 } = {}) {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    await limiter.acquire()
    const response = await fetch(url)

    /* 2xx and 3xx are done. 4xx other than 429 are our fault, so do not retry. */
    if (response.status !== 429 && response.status < 500) return response

    /* Prefer the server's own retry hint when it sends one. */
    const retryAfter = response.headers.get('retry-after')
    const seconds = Number(retryAfter)
    const delay = Number.isFinite(seconds)
      ? seconds * 1000
      : fullJitterDelay(attempt, baseMs, maxMs)

    console.log(`[retry] ${url} status=${response.status} attempt=${attempt + 1} wait=${Math.round(delay)}ms`)
    await sleep(delay)
  }

  throw new Error(`Gave up on ${url} after ${maxAttempts} attempts`)
}

/* Fire 20 requests at a service. The limiter holds them to 5 per second and
   backoff handles any 429 the limiter did not prevent. */
const urls = Array.from({ length: 20 }, (_, i) => `https://httpbin.org/status/200?n=${i}`)

const results = await Promise.all(
  urls.map(async url => {
    const response = await rateLimitedFetch(url)
    return { url, status: response.status }
  })
)

console.log(`Done. ${results.filter(r => r.status === 200).length}/${results.length} succeeded`)
bash
node rate-limit-backoff.mjs

What each step does

Refill the bucket on a timer. setInterval adds one token every 1000 / refillPerSecond milliseconds and clamps the total at capacity. The capacity controls burst size and refillPerSecond controls the sustained rate, and the two are independent. A capacity of 5 with a refill of 5 per second lets five requests fire immediately, then settles to five per second.

Call .unref() on the interval. A bare setInterval can keep the Node event loop alive after the request work finishes. unref() tells Node this timer should not by itself hold the process open, so the script can exit once the request work is done.

Wait for a token in acquire(). The loop polls every 25 milliseconds until a token is free, then spends one. Polling is simpler than a wakeup queue and the 25ms interval is acceptable for small local batches; point larger or high-concurrency scripts to a queued limiter, since the loop only spins while the bucket is empty.

Retry on 429 and 5xx, not on 4xx. A 429 means slow down and a 5xx is usually transient, so both are worth retrying. A 400 or 404 is a problem with the request itself, and retrying it just wastes attempts and hammers the server, so the code returns those straight through.

Compute the delay with full jitter. baseMs * 2 ** attempt is the classic exponential curve, capped at maxMs so it cannot grow without bound. Multiplying by Math.random() turns the fixed delay into a random one in [0, exponential], the full-jitter strategy from the AWS architecture blog. Spreading the delay across the whole window is what de-correlates clients that backed off at the same instant.

Use Retry-After when the server sends a number. A numeric Retry-After is a count of seconds, so the code reads it, multiplies by 1000, and waits that long instead of guessing, falling back to jitter when the header is absent or not numeric. The date-string form of the header is not handled here; the Gotchas show the Date.parse version.

Gotchas

  • The process keeps running after the requests finish.

    • Issue: setInterval registers a timer that can keep the Node event loop alive, so even after every request finishes the script may sit there instead of returning to the shell.
    • Fix: call .unref() on the interval handle so the timer does not on its own keep the process open, as the script does on the refill timer.
  • Backoff without jitter creates a retry storm.

    • Issue: a fixed baseMs * 2 ** attempt delay makes every client that hit the limit at the same moment retry at the same moment, so they collide again and keep tripping it.
    • Fix: wrap the delay in Math.random() * exponential (full jitter) so concurrent clients spread their retries across the whole window instead of bunching.
  • Retrying a 404 or 400 wastes the attempt budget.

    • Issue: a catch-all if (!response.ok) retry() retries permanent client errors, so a bad URL burns all five attempts and adds load for a request that will not succeed by retrying.
    • Fix: only retry 429 and 5xx, and return other 4xx responses straight to the caller, which is the status !== 429 && status < 500 guard above.
  • The rate cap is read off the response, so it reacts too late.

    • Issue: waiting until a 429 comes back before slowing down means you have already sent the burst that tripped the limit, and you spend the next several seconds backing off from a spike you could have prevented.
    • Fix: gate every request through limiter.acquire() before the fetch, so the steady rate is enforced at the source and backoff is only the safety net.
  • A numeric Retry-After is treated as milliseconds.

    • Issue: the Retry-After header is in seconds, so using it directly as a millisecond delay waits 1000x too short and immediately trips the limit again.
    • Fix: parse a finite number of seconds and multiply by 1000, as the script does with seconds * 1000. Note that Retry-After can also be an HTTP date string, which needs Date.parse rather than Number if the server you hit uses that form.
  • The token bucket only limits this process.

    • Issue: the in-memory bucket resets on restart and is invisible to other workers, so running four copies of the scraper sends four times the intended rate at the same API.
    • Fix: for a shared budget across processes, back the limiter with a store every worker reads, which is what the canonical rate-limiter-flexible does with Redis, or use bottleneck in clustered mode.
  • The retry loop has no overall deadline.

    • Issue: with a high maxAttempts and a large maxMs, a request to a persistently failing endpoint can sit in backoff for minutes, stalling the batch behind it.
    • Fix: add a wall-clock budget, breaking out of the loop once the accumulated wait exceeds a maxRetryTimeMs ceiling, the same cap that p-retry exposes as maxRetryTime.

Use this when

You are sending many requests to an API or a site that publishes a rate limit, and you want to stay under it while recovering automatically from the occasional 429 or transient 5xx, all in plain Node with nothing to install.

Skip this when

You need the limit enforced across several processes or machines, where a Redis-backed limiter like rate-limiter-flexible is the right tool; you only need to cap concurrency rather than rate, where a promise pool or p-limit is simpler; the failures you are retrying are scrape-level browser errors rather than HTTP status codes, where retry logic belongs in the Puppeteer layer; or the upstream offers a webhook or bulk endpoint, where one call replaces the whole polling loop.

Skip the code, just get the data

Simplescraper turns any website into structured data in seconds.