How to rate-limit requests with backoff in JavaScript

Updated 2026-06-24 · 6 min read

If you're getting rate limited, you're probably already seeing 429 responses come back, or watching a scraper that ran fine yesterday get throttled today. It is one of the first walls you hit once you send more than a handful of requests to the same server: go too fast and it starts refusing you, and once it has flagged you it can keep refusing for a while after you slow down.

The fix is to hold requests under the rate limit before fetch, then back off only on the responses that ask you to. We'll build a small script that refills a token bucket on a timer so callers wait for a free slot instead of firing everything at once, retries only the 429 and transient 5xx responses that are worth retrying, and spreads each retry with jitter so a batch that all tripped the limit at the same moment doesn't come back in lockstep, leaning on the server's own Retry-After hint whenever it sends one. It's about 70 lines of plain Node.js with nothing to install.

The complete script

// rate-limit-backoff.mjs

/* a token bucket. the bucket holds up to `capacity` tokens and refills
   `refillPerSecond` of them every second. each request spends one token.
   when the bucket is empty, acquire() waits until the next refill. */
function createRateLimiter({ capacity, refillPerSecond }) {
  let tokens = capacity
  const refillIntervalMs = 1000 / refillPerSecond

  /* top the bucket up one token at a time on a steady interval. unref() lets
     the process exit even while this timer is technically still pending. */
  setInterval(() => {
    tokens = Math.min(capacity, tokens + 1)
  }, refillIntervalMs).unref()

  async function acquire() {
    /* poll for a free token. the 25ms sleep is acceptable for small local
       batches; use a queued limiter for larger or high-concurrency scripts. */
    while (tokens < 1) {
      await new Promise(resolve => setTimeout(resolve, 25))
    }
    tokens -= 1
  }

  return { acquire }
}

const sleep = ms => new Promise(resolve => setTimeout(resolve, ms))

/* full jitter: a random delay in [0, cap]. this spreads out clients that all
   backed off at the same instant across the whole window, instead of bunching
   them at the same retry time. */
function fullJitterDelay(attempt, baseMs, maxMs) {
  const exponential = Math.min(maxMs, baseMs * 2 ** attempt)
  return Math.random() * exponential
}

const limiter = createRateLimiter({ capacity: 5, refillPerSecond: 5 })

async function rateLimitedFetch(url, { maxAttempts = 5, baseMs = 500, maxMs = 20000 } = {}) {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    await limiter.acquire()
    const response = await fetch(url)

    /* 2xx and 3xx are done. 4xx other than 429 are our fault, so do not retry. */
    if (response.status !== 429 && response.status < 500) return response

    /* prefer the server's own retry hint when it sends one. */
    const retryAfter = response.headers.get('retry-after')
    const seconds = Number(retryAfter)
    const delay = Number.isFinite(seconds)
      ? seconds * 1000
      : fullJitterDelay(attempt, baseMs, maxMs)

    console.log(`[retry] ${url} status=${response.status} attempt=${attempt + 1} wait=${Math.round(delay)}ms`)
    await sleep(delay)
  }

  throw new Error(`Gave up on ${url} after ${maxAttempts} attempts`)
}

/* fire 20 requests at a service. the limiter holds them to 5 per second and
   backoff handles any 429 the limiter did not prevent. */
const urls = Array.from({ length: 20 }, (_, i) => `https://httpbin.org/status/200?n=${i}`)

const results = await Promise.all(
  urls.map(async url => {
    const response = await rateLimitedFetch(url)
    return { url, status: response.status }
  })
)

console.log(`Done. ${results.filter(r => r.status === 200).length}/${results.length} succeeded`)

bash

node rate-limit-backoff.mjs

How it works

Refill the bucket on a timer. setInterval adds one token every 1000 / refillPerSecond milliseconds and clamps the total at capacity. The capacity controls burst size and refillPerSecond controls the sustained rate, and the two are independent. A capacity of 5 with a refill of 5 per second lets five requests fire immediately, then settles to five per second.

Call .unref() on the interval. A bare setInterval can keep the Node event loop alive after the request work finishes. unref() tells Node this timer should not by itself hold the process open, so the script can exit once the request work is done.

Wait for a token in acquire(). The loop polls every 25 milliseconds until a token is free, then spends one. Polling is simpler than a wakeup queue and the 25ms interval is acceptable for small local batches; point larger or high-concurrency scripts to a queued limiter, since the loop only spins while the bucket is empty. Gating here, before the fetch, is the point: read the cap off the response instead and you only slow down after the burst that tripped the limit has already gone out. The bucket also lives in this process only, so running four copies of the script sends four times the rate at the same API; for a shared budget across workers, back the limiter with a store every worker reads, such as Redis.

Retry on 429 and 5xx, not on 4xx. A 429 means slow down and a 5xx is usually transient, so both are worth retrying. A 400 or 404 is a problem with the request itself, and retrying it just wastes attempts and hammers the server, so the code returns those straight through.

Compute the delay with full jitter. baseMs * 2 ** attempt doubles the wait on each attempt, and maxMs caps it so it cannot grow without bound. Multiplying by Math.random() turns that into a random wait between zero and the cap, which spreads out clients that all backed off at the same instant so they do not come back in lockstep and trip the limit again. The loop has no overall deadline, so a high maxAttempts and a large maxMs can leave a failing endpoint backing off for minutes; add a wall-clock budget that breaks the loop once the total wait passes a ceiling.

Use Retry-After when the server sends a number. A numeric Retry-After is a count of seconds, so the code reads it, multiplies by 1000, and waits that long instead of guessing, falling back to jitter when the header is absent or not numeric. Retry-After can also arrive as an HTTP date string rather than a number, which needs Date.parse instead of Number if the server you hit uses that form.

Use this when

You are sending many requests to an API or a site that publishes a rate limit, and you want to stay under it while recovering automatically from the occasional 429 or transient 5xx, all in plain Node with nothing to install.

Skip this when

You need the limit enforced across several processes or machines, where a Redis-backed limiter like rate-limiter-flexible is the right tool; you only need to cap concurrency rather than rate, where a promise pool or p-limit is simpler; the failures you are retrying are scrape-level browser errors rather than HTTP status codes, where retry logic belongs in the Puppeteer layer; or the upstream offers a webhook or bulk endpoint, where one call replaces the whole polling loop.

How to rate-limit requests with backoff in JavaScript ​

The complete script ​

How it works ​

Related guides ​

Skip the code, just get the data Simplescraper turns any website into structured data in seconds.

How to rate-limit requests with backoff in JavaScript

The complete script

How it works

Related guides

Skip the code, just get the data
Simplescraper turns any website into structured data in seconds.