How to retry failed scrapes with exponential backoff

Updated 2026-06-24 · 5 min read

If you've watched a scrape die on a single transient hiccup, you've probably lost a whole job to a momentary 429, a 503 during a deploy, or one dropped connection that would have worked on a second try. A flat retry loop on a fixed delay does not help much: three tries one second apart hit a struggling server inside three seconds and push it further down, and a fleet of workers all looping on the same delay arrives in synchronized waves that look like an attack. This is a common failure mode at scale, and a spaced-out retry handles it.

The solution is to wrap the fetch in exponential backoff, where each retry waits on a widening curve, with jitter that randomizes those waits so parallel workers don't retry in lockstep. We'll build a small script that throws on the responses worth a second try so a momentarily overloaded server gets widening room to recover, spaces each attempt on an exponential curve of 1s, 2s, 4s, 8s, spreads those waits with jitter so a hundred parallel scrapers don't all retry on the same tick, and filters the errors so a permanent 404 or 403 fails fast instead of burning the budget. That takes about 40 lines of Node.js with one open-source library, p-retry.

The complete script

// retry-scrape.mjs
import pRetry, { AbortError } from 'p-retry'

const url = 'https://httpbin.org/status/200'

// http statuses that are worth retrying. a 429 or a 5xx is the server
// asking us to back off; a 404 or 403 will never become a 200 on retry.
const RETRYABLE_STATUS = new Set([408, 425, 429, 500, 502, 503, 504])

async function scrapeOnce(url) {
  const res = await fetch(url, {
    headers: { 'User-Agent': 'Mozilla/5.0' },
    // cap each attempt so a hung connection fails fast instead of
    // burning the whole retry budget on one stalled request.
    signal: AbortSignal.timeout(15_000)
  })

  if (!res.ok) {
    // AbortError tells p-retry to stop immediately, no more attempts.
    if (!RETRYABLE_STATUS.has(res.status)) {
      throw new AbortError(`Permanent HTTP ${res.status} for ${url}`)
    }
    throw new Error(`Retryable HTTP ${res.status} for ${url}`)
  }

  return res.text()
}

const html = await pRetry(() => scrapeOnce(url), {
  retries: 4,            // up to 4 retries after the first attempt
  factor: 2,            // double the wait each time: 1s, 2s, 4s, 8s
  minTimeout: 1_000,    // first backoff is 1 second
  maxTimeout: 30_000,   // never wait more than 30 seconds between tries
  randomize: true,      // jitter the delays so parallel workers desync
  onFailedAttempt: ({ error, attemptNumber, retriesLeft }) => {
    // p-retry passes a context object: the thrown error plus attempt metadata.
    console.log(`attempt ${attemptNumber} failed: ${error.message} (${retriesLeft} left)`)
  }
})

console.log(`Got ${html.length} bytes`)

bash

npm install p-retry
node retry-scrape.mjs

How it works

Define what is retryable. RETRYABLE_STATUS is a Set of the status codes where a retry can plausibly succeed: request timeout, too-early, rate-limited, and the 5xx server errors. Everything else is treated as final. Using a lookup set rather than a range check keeps the policy in one place and easy to audit.

Throw to signal a retry. p-retry decides whether to retry by whether your function rejects. A resolved promise means success and stops the loop, so scrapeOnce must throw on a bad status rather than returning the failed response. A fetch that gets a 503 resolves normally with res.ok === false, so you have to inspect the status yourself.

Abort on permanent failures. Throwing a plain Error tells p-retry to retry. Throwing an AbortError tells it to stop now and reject with that error. A 404 or 403 goes through the AbortError branch so the loop exits on the first attempt instead of waiting out the full backoff schedule.

Tune the backoff curve. factor: 2 with minTimeout: 1000 produces 1s, 2s, 4s, 8s. maxTimeout: 30000 clamps the upper end so a high retry count cannot schedule a multi-minute sleep. randomize: true applies the jitter so parallel workers don't back off by the identical delay and arrive in synchronized bursts. retries: 4 means five total attempts, the first try plus four retries.

Cap each attempt with a timeout. signal: AbortSignal.timeout(15_000) aborts any single fetch that stalls, which becomes a retryable failure quickly; without it a hung connection can hang for the platform default of often 300 seconds and the retries never fire.

Honor Retry-After on a 429. The exponential curve picks its own delays, so when a server explicitly says Retry-After: 30 a 1s backoff just earns another 429; read res.headers.get('retry-after') in the failed branch and sleep that long before throwing if you scrape rate-limited APIs.

Watch each failure. onFailedAttempt fires after every failed try with a context object carrying the thrown error, the attemptNumber, and retriesLeft, which is where logging belongs. Throwing inside this callback aborts the whole retry, so keep it to observation. Retries also assume the request is safe to repeat, which holds for reads but can double a side effect on a POST that submits a form or records a job, so only wrap writes when the server deduplicates on a request key.

Use this when

You are scraping at scale and want a single page fetch to survive a transient hiccup, a momentary 429, a 503 during a deploy, or a dropped connection, without failing the whole job or hand-rolling a retry loop per call site.

Skip this when

The failure is permanent and retrying cannot fix it, such as a 404 or a 403 (fix the URL or the auth); you need to cap total parallel load rather than handle one call's failures (use a rate limiter like p-throttle); a host keeps failing and you want to stop sending it traffic entirely (use a circuit breaker like cockatiel or opossum); or the work must persist across a process restart (use a job queue like BullMQ).

How to retry failed scrapes with exponential backoff ​

The complete script ​

How it works ​

Related guides ​

Skip the code, just get the data Simplescraper turns any website into structured data in seconds.

How to retry failed scrapes with exponential backoff

The complete script

How it works

Related guides

Skip the code, just get the data
Simplescraper turns any website into structured data in seconds.