Simplescraper
Skip to content

How to handle cookies and sessions when scraping in Node.js

How to handle cookies and sessions when scraping in Node.js

Updated 2026-06-25 · 6 min read

If you log in once with fetch and then the next request comes back as if you are a stranger, you are hitting the thing every scraper hits eventually: fetch does not keep cookies. Each call is independent, so the Set-Cookie headers the server hands you after login are dropped on the floor, and the page behind the login wall returns the logged-out version or a redirect to the sign-in form.

The solution is to hold the cookies the server sends and replay them on the next request, which is exactly what a cookie jar does. For HTTP-level scraping you wire a tough-cookie jar into fetch, and for browser-level scraping you save the browser's session to a file and restore it on the next run so you do not log in every time. The HTTP path is about 40 lines and the browser save/restore is a few lines on top of a normal launch.

Key terms

  • Cookie jar. An in-memory store that holds cookies per domain and path, applies expiry and Secure rules, and gives you back the right Cookie header for a given URL. tough-cookie is the canonical implementation in the Node ecosystem.
  • Session. The server's record that you are logged in, keyed by a session cookie it set after login. Replaying that cookie on later requests is what keeps you authenticated.
  • storageState. The cookies plus localStorage that Puppeteer and Playwright can serialize to JSON and reload, so a browser starts an already-logged-in run without repeating the login flow.

Here is what the script does:

  • Create a tough-cookie CookieJar and a small wrapper around fetch that reads the Cookie header out of the jar before each request and writes any Set-Cookie back into it.
  • Post the login form, let the wrapper capture the session cookie the server returns, and follow up with an authenticated request that the jar now carries the cookie for.
  • Serialize the jar to JSON so a later run can rehydrate the same session without logging in again.
  • For the browser case, dump Playwright or Puppeteer cookies to a file and feed them back on the next launch.

The complete script

js
// session-fetch.mjs
import { CookieJar } from 'tough-cookie'
import { writeFile, readFile } from 'node:fs/promises'

/* One jar holds every cookie the server sets, keyed by domain and path. */
const jar = new CookieJar()

/* Wrap fetch so each call sends the jar's Cookie header for this URL,
   then stores any Set-Cookie the response returns back into the jar. */
async function fetchWithJar(url, options = {}) {
  const cookieHeader = await jar.getCookieString(url)
  const headers = {
    'User-Agent': 'Mozilla/5.0',
    ...options.headers,
    /* Only attach Cookie when the jar actually has one for this URL. */
    ...(cookieHeader ? { Cookie: cookieHeader } : {})
  }

  /* redirect: 'manual' so a 302 to a logged-in page does not strip the
     Set-Cookie before we read it. We follow the redirect ourselves. */
  const response = await fetch(url, { ...options, headers, redirect: 'manual' })

  /* getSetCookie() returns the Set-Cookie headers as an array (Node 18.14+),
     so multiple cookies on one response are not collapsed into one string. */
  for (const cookie of response.headers.getSetCookie()) {
    await jar.setCookie(cookie, response.url || url)
  }

  /* Follow one redirect by hand, carrying the freshly-updated jar with us. */
  const location = response.headers.get('location')
  if (response.status >= 300 && response.status < 400 && location) {
    return fetchWithJar(new URL(location, url).href, { ...options, method: 'GET', body: undefined })
  }

  return response
}

const base = 'https://practice.expandtesting.com'

/* 1. Log in. The server replies with a session cookie that the jar captures. */
await fetchWithJar(`${base}/login`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
  body: new URLSearchParams({ username: 'practice', password: 'SuperSecretPassword!' })
})

/* 2. Hit a page that needs the session. The jar attaches the cookie for us. */
const secure = await fetchWithJar(`${base}/secure`)
const html = await secure.text()
console.log('logged in:', html.includes('You logged into a secure area'))

/* 3. Persist the session so a later run skips the login step entirely. */
await writeFile('session.json', JSON.stringify(jar.serializeSync()))

/* On the next run, rehydrate instead of logging in again:
   const saved = JSON.parse(await readFile('session.json', 'utf8'))
   const jar = CookieJar.deserializeSync(saved)
*/
bash
npm install tough-cookie
node session-fetch.mjs

What each step does

Create one jar and share it across requests. tough-cookie's CookieJar is the canonical RFC 6265 store: it tracks each cookie's domain, path, expiry, and Secure and SameSite flags, and it hands back only the cookies that match the URL you are about to request. One jar per session is the whole trick. Reuse it for every call in the flow.

Read before the request, write after. jar.getCookieString(url) builds the Cookie header for that exact URL, so a cookie scoped to /secure does not leak onto a request for another path. After the response lands, response.headers.getSetCookie() returns the Set-Cookie headers as an array, and each one goes back into the jar with jar.setCookie(). The native Headers.get('set-cookie') collapses multiple cookies into one comma-joined string that does not round-trip, which is why getSetCookie() exists.

Handle the post-login redirect by hand. Many login forms answer with a 302 to the landing page and set the session cookie on that 302 response. With redirect: 'manual' you read the Set-Cookie off the redirect before following it, then re-request the Location as a GET carrying the now-populated jar. Let fetch auto-follow instead and the captured cookie can be dropped between hops.

Serialize the jar to keep the session. jar.serializeSync() returns a plain object you can write to disk as JSON. CookieJar.deserializeSync() rebuilds the same jar on the next run, so the second run starts already authenticated and skips the login POST until the session cookie expires.

Gotchas

  • fetch silently sends no cookies at all.

    • Issue: fetch(url) in Node has no cookie store, so the session cookie from your login response never reaches the next request and the server treats you as logged out.
    • Fix: route every call through a wrapper that calls jar.getCookieString(url) before the request and jar.setCookie() after, as in the script above.
  • headers.get('set-cookie') mangles multiple cookies.

    • Issue: when a response sets several cookies, response.headers.get('set-cookie') joins them into one comma-separated string, and tough-cookie cannot parse that back into separate cookies.
    • Fix: use response.headers.getSetCookie() (Node 18.14 and later), which returns each Set-Cookie header as its own array element.
  • Auto-following the login redirect drops the cookie.

    • Issue: with the default redirect: 'follow', fetch resolves the 302 internally and you only see the final response, so a session cookie set on the redirect itself is gone before you can store it.
    • Fix: pass redirect: 'manual', read getSetCookie() off the 3xx response, then re-request the Location yourself.
  • The session cookie expires while the scrape runs.

    • Issue: a session cookie carries an expiry, and a long-running job keeps replaying a cookie the server already invalidated, so requests start coming back logged-out partway through.
    • Fix: treat a logged-out response (a redirect to /login or a missing post-auth marker) as a signal to log in again and refresh the jar, rather than retrying the same stale cookie.
  • localStorage tokens are not cookies.

    • Issue: some sites keep the auth token in localStorage and send it as an Authorization header from client JavaScript, so a cookie jar captures nothing and the HTTP path never authenticates.
    • Fix: drive the login in a browser and persist storageState, which carries both cookies and localStorage, instead of the fetch jar.
  • Reusing one saved session across parallel workers corrupts it.

    • Issue: pointing several concurrent processes at the same session.json lets them overwrite each other's cookies and interleave logins, which can invalidate the session server-side.
    • Fix: give each worker its own jar and its own session file, or serialize logins so one jar is established before the workers fan out.

For the browser case, the same idea lives behind a built-in API. With Playwright, await context.storageState({ path: 'state.json' }) writes the cookies and localStorage to disk after you log in, and launching with browser.newContext({ storageState: 'state.json' }) restores them so the next run starts authenticated. With Puppeteer, read await page.cookies() to a file after login and replay them with await page.setCookie(...saved) before navigating. Both let you log in once interactively and reuse that session across runs.

Use this when

You need to stay logged in across requests while scraping in Node.js: a site behind a form login, an API that hands back a session cookie, or any flow where page two depends on a cookie set on page one. The fetch jar covers HTTP-level scraping; the storageState save and restore covers browser-level scraping with Puppeteer or Playwright.

Skip this when

The site has no login and serves the same content to anonymous requests (drop the jar and fetch directly); the auth is a bearer token you already hold (send it as an Authorization header, no jar needed); the login is gated by a CAPTCHA or a Cloudflare challenge (solve that in a browser first, then export the cookie); or the token lives only in localStorage (use the browser storageState path rather than the fetch jar).

Skip the code, just get the data

Simplescraper turns any website into structured data in seconds.