How to handle cookies and sessions when scraping in Node.js

Updated 2026-06-25 · 6 min read

If you log in once with fetch and then the next request comes back as if you are a stranger, you are hitting the thing every scraper hits eventually: fetch does not keep cookies. Each call is independent, so the Set-Cookie headers the server hands you after login are dropped on the floor, and the page behind the login wall returns the logged-out version or a redirect to the sign-in form.

The solution is to hold the cookies the server sends and replay them on the next request, which is exactly what a cookie jar does: an in-memory store that holds cookies per domain and path and hands back the right Cookie header for a given URL. We'll build a small script that wraps fetch with a tough-cookie jar so each call reads the stored cookies before the request and saves any new ones after, posts the login form so the jar captures the session cookie that keeps you authenticated, follows up with a request that needs that session, and serializes the jar to a file so a later run starts already logged in. For browser-level scraping we save the browser's session to a file and restore it on the next launch so you do not log in every time. The HTTP path is about 40 lines and one package, and the browser save/restore is a few lines on top of a normal launch.

The complete script

// session-fetch.mjs
import { CookieJar } from 'tough-cookie'
import { writeFile, readFile } from 'node:fs/promises'

/* one jar holds every cookie the server sets, keyed by domain and path. */
const jar = new CookieJar()

/* wrap fetch so each call sends the jar's Cookie header for this URL,
   then stores any Set-Cookie the response returns back into the jar. */
async function fetchWithJar(url, options = {}) {
  const cookieHeader = await jar.getCookieString(url)
  const headers = {
    'User-Agent': 'Mozilla/5.0',
    ...options.headers,
    /* only attach Cookie when the jar actually has one for this URL. */
    ...(cookieHeader ? { Cookie: cookieHeader } : {})
  }

  /* redirect: 'manual' so a 302 to a logged-in page does not strip the
     Set-Cookie before we read it. we follow the redirect ourselves. */
  const response = await fetch(url, { ...options, headers, redirect: 'manual' })

  /* getSetCookie() returns the Set-Cookie headers as an array (Node 18.14+),
     so multiple cookies on one response are not collapsed into one string. */
  for (const cookie of response.headers.getSetCookie()) {
    await jar.setCookie(cookie, response.url || url)
  }

  /* follow one redirect by hand, carrying the freshly-updated jar with us. */
  const location = response.headers.get('location')
  if (response.status >= 300 && response.status < 400 && location) {
    return fetchWithJar(new URL(location, url).href, { ...options, method: 'GET', body: undefined })
  }

  return response
}

const base = 'https://practice.expandtesting.com'

/* 1. log in. the server replies with a session cookie that the jar captures. */
await fetchWithJar(`${base}/login`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
  body: new URLSearchParams({ username: 'practice', password: 'SuperSecretPassword!' })
})

/* 2. hit a page that needs the session. the jar attaches the cookie for us. */
const secure = await fetchWithJar(`${base}/secure`)
const html = await secure.text()
console.log('logged in:', html.includes('You logged into a secure area'))

/* 3. persist the session so a later run skips the login step entirely. */
await writeFile('session.json', JSON.stringify(jar.serializeSync()))

/* on the next run, rehydrate instead of logging in again:
   const saved = JSON.parse(await readFile('session.json', 'utf8'))
   const jar = CookieJar.deserializeSync(saved)
*/

bash

npm install tough-cookie
node session-fetch.mjs

How it works

Create one jar and share it across requests. tough-cookie's CookieJar is the canonical RFC 6265 store: it tracks each cookie's domain, path, expiry, and Secure and SameSite flags, and it hands back only the cookies that match the URL you are about to request. One jar per session is the whole trick. Reuse it for every call in the flow.

Read before the request, write after. jar.getCookieString(url) builds the Cookie header for that exact URL, so a cookie scoped to /secure does not leak onto a request for another path. After the response lands, response.headers.getSetCookie() returns the Set-Cookie headers as an array, and each one goes back into the jar with jar.setCookie(). The native Headers.get('set-cookie') collapses multiple cookies into one comma-joined string that does not round-trip, which is why getSetCookie() exists.

Handle the post-login redirect by hand. Many login forms answer with a 302 to the landing page and set the session cookie on that 302 response. With redirect: 'manual' you read the Set-Cookie off the redirect before following it, then re-request the Location as a GET carrying the now-populated jar. Let fetch auto-follow instead and the captured cookie can be dropped between hops.

Serialize the jar to keep the session. jar.serializeSync() returns a plain object you can write to disk as JSON. CookieJar.deserializeSync() rebuilds the same jar on the next run, so the second run starts already authenticated and skips the login POST until the session cookie expires. That expiry is the catch on long-running jobs: once the server invalidates the session, replaying the same cookie just comes back logged-out, so treat a logged-out response (a redirect to /login or a missing post-auth marker) as a signal to log in again and refresh the jar. And the file is single-owner: point several concurrent workers at one session.json and they overwrite each other's cookies and interleave logins, which can invalidate the session server-side, so give each worker its own jar and session file, or establish one jar before the workers fan out.

For the browser case, the same idea lives behind a built-in API. With Playwright, await context.storageState({ path: 'state.json' }) writes the cookies and localStorage to disk after you log in, and launching with browser.newContext({ storageState: 'state.json' }) restores them so the next run starts authenticated. With Puppeteer, read await page.cookies() to a file after login and replay them with await page.setCookie(...saved) before navigating. Both let you log in once interactively and reuse that session across runs. This is also the path to take when a site keeps its auth token in localStorage rather than a cookie: the fetch jar captures nothing there, so storageState, which carries both cookies and localStorage, is what authenticates.

Use this when

You need to stay logged in across requests while scraping in Node.js: a site behind a form login, an API that hands back a session cookie, or any flow where page two depends on a cookie set on page one. The fetch jar covers HTTP-level scraping; the storageState save and restore covers browser-level scraping with Puppeteer or Playwright.

Skip this when

The site has no login and serves the same content to anonymous requests (drop the jar and fetch directly); the auth is a bearer token you already hold (send it as an Authorization header, no jar needed); the login is gated by a CAPTCHA or a Cloudflare challenge (solve that in a browser first, then export the cookie); or the token lives only in localStorage (use the browser storageState path rather than the fetch jar).

How to handle cookies and sessions when scraping in Node.js ​

The complete script ​

How it works ​

Related guides ​

Skip the code, just get the data Simplescraper turns any website into structured data in seconds.

How to handle cookies and sessions when scraping in Node.js

The complete script

How it works

Related guides

Skip the code, just get the data
Simplescraper turns any website into structured data in seconds.