How to scrape a JavaScript-rendered page in Node.js

Updated 2026-06-25 · 5 min read

If you've pointed fetch or a parser like Cheerio at a modern site and gotten back an empty shell, you're hitting the gap between the HTML the server sends and the HTML the browser ends up showing. React, Vue, Svelte, and most server-rendered frameworks with client hydration ship a near-empty <div id="root"> and then fill it in with JavaScript after load. Your request only ever sees the first version, so the list, the prices, or the article text you wanted are not in the bytes you got.

The solution is to run the page in a headless browser - a full Chromium instance with no visible window, driven by code - that executes the page's JavaScript, then wait until the content you want has actually appeared before reading the DOM. We'll build a small script that launches headless Chromium so the page renders the way it would for a visitor, navigates to a page whose content is injected client-side, waits for the specific element the page renders so the read never runs against an empty shell, and reads the rendered values out of the live DOM. It comes out to about 30 lines of Node.js with one dependency, the puppeteer package.

The complete script

// scrape-rendered-page.mjs
import puppeteer from 'puppeteer'

// quotes.toscrape.com/js renders its quotes with client-side JavaScript.
// a plain fetch of this URL returns an empty container; the browser fills it in.
const url = 'https://quotes.toscrape.com/js/'

const browser = await puppeteer.launch({ headless: true })

try {
  const page = await browser.newPage()

  // domcontentloaded fires once the HTML is parsed; the JS that injects
  // the quotes has not necessarily run yet, so do not read the DOM here.
  await page.goto(url, { waitUntil: 'domcontentloaded' })

  // wait for the actual content, not a fixed sleep. this resolves the moment
  // the first .quote element exists, and throws after 10s if it never does.
  await page.waitForSelector('.quote', { timeout: 10_000 })

  // read values out of the rendered DOM, inside the page's own JS context.
  const quotes = await page.evaluate(() =>
    Array.from(document.querySelectorAll('.quote')).map((el) => ({
      text: el.querySelector('.text')?.textContent ?? '',
      author: el.querySelector('.author')?.textContent ?? ''
    }))
  )

  console.log(`Got ${quotes.length} quotes`)
  console.log(quotes)
} finally {
  // close the browser even if a selector wait or navigation throws,
  // otherwise the Chromium process leaks and the script hangs.
  await browser.close()
}

bash

npm install puppeteer
node scrape-rendered-page.mjs

How it works

Launch with headless: true. This is the current default and runs Chromium with no visible window. Set it to false while you are debugging so you can watch the page render; switch it back for unattended runs. Do not pass the old 'new' string, which recent Puppeteer versions have dropped. Some sites serve a challenge page to the default headless build, so the selector you expect never renders and the wait times out; set a stock desktop User-Agent with await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36'), and for harder anti-bot fronts see How to patch headless Chrome to avoid detection.

Navigate with waitUntil: 'domcontentloaded'. This returns once the initial HTML is parsed, which is fast, but the framework's JavaScript may not have populated the page yet. Do not read the DOM at this point. The alternative networkidle0 waits for the network to go quiet, which is heavier and hangs on pages that poll, stream, or run analytics beacons because they never reach zero in-flight requests, so a domcontentloaded navigation followed by an explicit element wait is the more predictable pair.

Wait for the rendered element, not a timer. page.waitForSelector('.quote') resolves the instant the first matching element appears, so the read runs against populated content rather than the empty shell. It is bounded by timeout: 10_000, so a page that never renders the selector fails in ten seconds instead of hanging. A fixed setTimeout either wastes time on fast pages or fires too early on slow ones. When you need a count rather than a first appearance, wait on page.waitForFunction(() => document.querySelectorAll('.quote').length >= 10) instead.

Read inside page.evaluate. The callback runs in the page's own context, where document and the hydrated DOM exist. Pull out plain strings and numbers and return them; Puppeteer serializes the result back to Node. DOM nodes themselves cannot cross that boundary, so map them to values before you return. One read only confirms the first batch rendered, so for content behind a "load more" button or pagination, drive the interaction first - handle the "load more" button or scroll - then read once the new items have appeared.

Use this when

A page renders its content with client-side JavaScript, so a plain HTTP fetch comes back as an empty shell and you need the browser to run the page before you can read it. Single-page apps, hydrated server-rendered sites, and pages that fetch their data over XHR after load all fall here.

Skip this when

The HTML already contains the data on the first request, in which case fetch plus Cheerio is faster and lighter; the values come from a JSON API the page calls, where hitting that endpoint directly skips rendering entirely; you only need the cleaned article text, where a readability pass on the rendered HTML is the better tool; or the page sits behind a login, where you handle authentication before any of this.

How to scrape a JavaScript-rendered page in Node.js ​

The complete script ​

How it works ​

Related guides ​

Skip the code, just get the data Simplescraper turns any website into structured data in seconds.

How to scrape a JavaScript-rendered page in Node.js

The complete script

How it works

Related guides

Skip the code, just get the data
Simplescraper turns any website into structured data in seconds.