How to scrape a single-page app (React/Vue) in Node.js

Updated 2026-06-25 · 6 min read

If you fetched a React or Vue page and logged the HTML, you probably got back a near-empty shell: a <div id="root"></div> or <div id="app"></div>, a couple of bundled script tags, and none of the products, posts, or rows you came for. The data is there when you open the page in a browser, but a plain fetch only sees the server's first response, which fires before the framework has run a single line of JavaScript.

The solution is to drive a real Chromium with Puppeteer, an automation library that runs a headless browser, so the page's JavaScript actually executes. We'll build a small script that launches that browser and navigates with the network allowed to go idle so the framework's initial data fetch completes, then waits for the rendered items to actually appear in the DOM so we read after React or Vue has painted rather than before, and finally pulls the fields out of each card and prints them as JSON. The piece people get wrong is that wait: navigating is not enough, because the markup you want appears a few hundred milliseconds later when an XHR resolves. It is about 30 lines of Node.js with one library.

The complete script

// scrape-spa.mjs
import puppeteer from 'puppeteer'

// a Vue-rendered demo store. the initial HTML is an empty #app shell;
// the product grid is fetched and rendered client-side after load.
const url = 'https://vuejs-demo-store.netlify.app/'
const itemSelector = '.product-card'

const browser = await puppeteer.launch({ headless: true })

try {
  const page = await browser.newPage()

  // a bare Node User-Agent gets bot-blocked on plenty of sites; send a stock desktop one.
  await page.setUserAgent(
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 ' +
    '(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36'
  )

  // 'networkidle2' resolves once there are <=2 connections for 500ms,
  // which usually means the framework's data XHR has come back.
  await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 })

  // network-idle is not the same as content-rendered. wait for the items
  // themselves, polling until at least one matches, capped at 15s.
  await page.waitForFunction(
    (selector) => document.querySelectorAll(selector).length > 0,
    { timeout: 15000 },
    itemSelector
  )

  // run in the page context once the DOM has the data, and read each card's fields.
  const products = await page.$$eval(itemSelector, (cards) =>
    cards.map((card) => ({
      name: card.querySelector('.product-name')?.textContent?.trim() ?? null,
      price: card.querySelector('.product-price')?.textContent?.trim() ?? null
    }))
  )

  console.log(`Found ${products.length} products`)
  console.log(JSON.stringify(products, null, 2))
} finally {
  // close the browser even if a wait times out, so the process exits.
  await browser.close()
}

bash

npm install puppeteer
node scrape-spa.mjs

How it works

Launch with headless: true. Puppeteer downloads its own Chromium on install and runs it without a window. The browser executes the page's bundle the way a user's browser would, which is the whole reason a SPA becomes readable. Wrap everything in try/finally so a timeout still reaches browser.close(). Headless can also lay the page out differently from the window you tested in, since a narrow default viewport can trigger a mobile layout or lazy-render fewer cards, so set it explicitly with await page.setViewport({ width: 1366, height: 900 }) before goto to match what you inspected.

Set a stock desktop User-Agent. Puppeteer's default UA contains HeadlessChrome, which some sites key on to serve a stripped page or a block. Swapping in a regular Chrome-on-macOS string sidesteps the simplest checks. This handles polite servers, not aggressive anti-bot systems, which need more than a header.

Navigate with waitUntil: 'networkidle2'. A SPA's content lands after the framework boots and its first data request returns. networkidle2 holds goto until there have been no more than two open connections for 500ms, which usually lines up with that XHR resolving. The 30-second timeout bounds a page that never quiets down. On a chatty page where analytics beacons, websockets, or polling keep connections open, the network never reaches idle and goto waits the full timeout, so switch the navigation to waitUntil: 'domcontentloaded' and lean on the content wait below for the real signal.

Wait for the items with waitForFunction. Network-idle and content-rendered are different moments: idle can fire while the list is still being built into the DOM. The predicate polls inside the page until at least one .product-card exists, so you read after the cards mount. This is the line that separates a reliable SPA scrape from an empty array. Pick a selector that survives a redeploy: a build-time CSS-modules hash like .product-card__a8f3c changes every build, so target a stable attribute such as [data-testid="product-card"] instead. And if the page renders only the first screen and fetches the rest on scroll, one wait for one card undercounts the list, so scroll in a loop with page.evaluate(() => window.scrollTo(0, document.body.scrollHeight)) and re-check the count until it stops growing (see How to handle "load more" buttons when scraping).

Read fields with page.$$eval. The callback runs in the page, receives every node matching the selector, and returns plain objects back to Node. Because it runs in the browser context and not Node, it cannot see outer variables like itemSelector unless you pass them as trailing arguments (page.$$eval(sel, (cards, n) => ..., maxItems)), and only serializable values cross that boundary. Optional chaining and a null fallback mean a card missing a price yields null for that field instead of throwing and losing the whole run.

Use this when

You need data from a React, Vue, Angular, or other client-rendered page where a plain fetch returns an empty shell, and the content you want lands in the DOM after the framework runs. Product grids, dashboards, infinite feeds, and search results all fit.

Skip this when

The page is server-rendered or static (a plain fetch plus an HTML parser like cheerio is faster and lighter); the data comes from a JSON endpoint you can call directly (hit that XHR and skip the browser entirely); you only need the article text rather than structured fields (run the rendered HTML through Readability and Turndown); the site sits behind aggressive anti-bot defenses (you need a stealth patch and proxies on top of this).

How to scrape a single-page app (React/Vue) in Node.js ​

The complete script ​

How it works ​

Related guides ​

Skip the code, just get the data Simplescraper turns any website into structured data in seconds.

How to scrape a single-page app (React/Vue) in Node.js

The complete script

How it works

Related guides

Skip the code, just get the data
Simplescraper turns any website into structured data in seconds.