How to rate-limit requests with backoff in JavaScript
If you're getting rate limited, you're probably already seeing 429 responses come back, or watching a scraper that ran fine yesterday get throttled today. It is one of the first walls you hit once you send more than a handful of requests to the same server: go too fast and it starts refusing you, and once it has flagged you it can keep refusing for a while after you slow down.
The mechanism is to hold requests under the rate limit before fetch, then back off only on 429 and transient 5xx responses. It takes about 70 lines of plain Node.js, with nothing to install.
Key terms
- Token bucket. A rate limiter that holds up to
capacitytokens and refills at a fixed rate, so each request spends one token and bursts past the refill rate are made to wait. - Exponential backoff. A retry strategy that doubles the wait after each failed attempt, up to a ceiling, so a struggling server gets progressively more breathing room.
- Full jitter. Replacing each backoff delay with a random value between zero and the computed delay, so clients that failed together do not all retry at the same instant.
- Thundering herd. The collision that happens when many clients back off and retry in lockstep, hitting the server in a synchronized wave that trips the limit again.
Retry-After. An HTTP response header, in seconds or as a date, that tells the client how long to wait before retrying.
Here is what the script does:
- Refill a token bucket on a timer so the limiter allows a steady number of requests per second and makes callers wait for a token instead of firing them all at once.
- Wrap every request in an
acquire()call that waits for a free token, so the rate cap is enforced at the source rather than after the server complains. - Retry a request that fails with 429 or a 5xx, doubling the wait each attempt up to a ceiling.
- Add full jitter to every backoff delay so a batch of clients that all tripped the limit at the same moment does not retry in lockstep.
- Use a numeric
Retry-Afterheader (a count of seconds) when the server sends one, instead of guessing the delay.
The complete script
// rate-limit-backoff.mjs
/* A token bucket. The bucket holds up to `capacity` tokens and refills
`refillPerSecond` of them every second. Each request spends one token.
When the bucket is empty, acquire() waits until the next refill. */
function createRateLimiter({ capacity, refillPerSecond }) {
let tokens = capacity
const refillIntervalMs = 1000 / refillPerSecond
/* Top the bucket up one token at a time on a steady interval. unref() lets
the process exit even while this timer is technically still pending. */
setInterval(() => {
tokens = Math.min(capacity, tokens + 1)
}, refillIntervalMs).unref()
async function acquire() {
/* Poll for a free token. The 25ms sleep is acceptable for small local
batches; use a queued limiter for larger or high-concurrency scripts. */
while (tokens < 1) {
await new Promise(resolve => setTimeout(resolve, 25))
}
tokens -= 1
}
return { acquire }
}
const sleep = ms => new Promise(resolve => setTimeout(resolve, ms))
/* Full jitter: a random delay in [0, cap]. This spreads a thundering herd of
clients that all backed off at the same instant across the whole window
instead of bunching them at the same retry time. */
function fullJitterDelay(attempt, baseMs, maxMs) {
const exponential = Math.min(maxMs, baseMs * 2 ** attempt)
return Math.random() * exponential
}
const limiter = createRateLimiter({ capacity: 5, refillPerSecond: 5 })
async function rateLimitedFetch(url, { maxAttempts = 5, baseMs = 500, maxMs = 20000 } = {}) {
for (let attempt = 0; attempt < maxAttempts; attempt++) {
await limiter.acquire()
const response = await fetch(url)
/* 2xx and 3xx are done. 4xx other than 429 are our fault, so do not retry. */
if (response.status !== 429 && response.status < 500) return response
/* Prefer the server's own retry hint when it sends one. */
const retryAfter = response.headers.get('retry-after')
const seconds = Number(retryAfter)
const delay = Number.isFinite(seconds)
? seconds * 1000
: fullJitterDelay(attempt, baseMs, maxMs)
console.log(`[retry] ${url} status=${response.status} attempt=${attempt + 1} wait=${Math.round(delay)}ms`)
await sleep(delay)
}
throw new Error(`Gave up on ${url} after ${maxAttempts} attempts`)
}
/* Fire 20 requests at a service. The limiter holds them to 5 per second and
backoff handles any 429 the limiter did not prevent. */
const urls = Array.from({ length: 20 }, (_, i) => `https://httpbin.org/status/200?n=${i}`)
const results = await Promise.all(
urls.map(async url => {
const response = await rateLimitedFetch(url)
return { url, status: response.status }
})
)
console.log(`Done. ${results.filter(r => r.status === 200).length}/${results.length} succeeded`)node rate-limit-backoff.mjsWhat each step does
Refill the bucket on a timer. setInterval adds one token every 1000 / refillPerSecond milliseconds and clamps the total at capacity. The capacity controls burst size and refillPerSecond controls the sustained rate, and the two are independent. A capacity of 5 with a refill of 5 per second lets five requests fire immediately, then settles to five per second.
Call .unref() on the interval. A bare setInterval can keep the Node event loop alive after the request work finishes. unref() tells Node this timer should not by itself hold the process open, so the script can exit once the request work is done.
Wait for a token in acquire(). The loop polls every 25 milliseconds until a token is free, then spends one. Polling is simpler than a wakeup queue and the 25ms interval is acceptable for small local batches; point larger or high-concurrency scripts to a queued limiter, since the loop only spins while the bucket is empty.
Retry on 429 and 5xx, not on 4xx. A 429 means slow down and a 5xx is usually transient, so both are worth retrying. A 400 or 404 is a problem with the request itself, and retrying it just wastes attempts and hammers the server, so the code returns those straight through.
Compute the delay with full jitter. baseMs * 2 ** attempt is the classic exponential curve, capped at maxMs so it cannot grow without bound. Multiplying by Math.random() turns the fixed delay into a random one in [0, exponential], the full-jitter strategy from the AWS architecture blog. Spreading the delay across the whole window is what de-correlates clients that backed off at the same instant.
Use Retry-After when the server sends a number. A numeric Retry-After is a count of seconds, so the code reads it, multiplies by 1000, and waits that long instead of guessing, falling back to jitter when the header is absent or not numeric. The date-string form of the header is not handled here; the Gotchas show the Date.parse version.
Gotchas
The process keeps running after the requests finish.
- Issue:
setIntervalregisters a timer that can keep the Node event loop alive, so even after every request finishes the script may sit there instead of returning to the shell. - Fix: call
.unref()on the interval handle so the timer does not on its own keep the process open, as the script does on the refill timer.
- Issue:
Backoff without jitter creates a retry storm.
- Issue: a fixed
baseMs * 2 ** attemptdelay makes every client that hit the limit at the same moment retry at the same moment, so they collide again and keep tripping it. - Fix: wrap the delay in
Math.random() * exponential(full jitter) so concurrent clients spread their retries across the whole window instead of bunching.
- Issue: a fixed
Retrying a 404 or 400 wastes the attempt budget.
- Issue: a catch-all
if (!response.ok) retry()retries permanent client errors, so a bad URL burns all five attempts and adds load for a request that will not succeed by retrying. - Fix: only retry 429 and 5xx, and return other 4xx responses straight to the caller, which is the
status !== 429 && status < 500guard above.
- Issue: a catch-all
The rate cap is read off the response, so it reacts too late.
- Issue: waiting until a 429 comes back before slowing down means you have already sent the burst that tripped the limit, and you spend the next several seconds backing off from a spike you could have prevented.
- Fix: gate every request through
limiter.acquire()before thefetch, so the steady rate is enforced at the source and backoff is only the safety net.
A numeric
Retry-Afteris treated as milliseconds.- Issue: the
Retry-Afterheader is in seconds, so using it directly as a millisecond delay waits 1000x too short and immediately trips the limit again. - Fix: parse a finite number of seconds and multiply by 1000, as the script does with
seconds * 1000. Note thatRetry-Aftercan also be an HTTP date string, which needsDate.parserather thanNumberif the server you hit uses that form.
- Issue: the
The token bucket only limits this process.
- Issue: the in-memory bucket resets on restart and is invisible to other workers, so running four copies of the scraper sends four times the intended rate at the same API.
- Fix: for a shared budget across processes, back the limiter with a store every worker reads, which is what the canonical rate-limiter-flexible does with Redis, or use bottleneck in clustered mode.
The retry loop has no overall deadline.
- Issue: with a high
maxAttemptsand a largemaxMs, a request to a persistently failing endpoint can sit in backoff for minutes, stalling the batch behind it. - Fix: add a wall-clock budget, breaking out of the loop once the accumulated wait exceeds a
maxRetryTimeMsceiling, the same cap that p-retry exposes asmaxRetryTime.
- Issue: with a high
Use this when
You are sending many requests to an API or a site that publishes a rate limit, and you want to stay under it while recovering automatically from the occasional 429 or transient 5xx, all in plain Node with nothing to install.
Skip this when
You need the limit enforced across several processes or machines, where a Redis-backed limiter like rate-limiter-flexible is the right tool; you only need to cap concurrency rather than rate, where a promise pool or p-limit is simpler; the failures you are retrying are scrape-level browser errors rather than HTTP status codes, where retry logic belongs in the Puppeteer layer; or the upstream offers a webhook or bulk endpoint, where one call replaces the whole polling loop.