Simplescraper
Skip to content

How to wait for an element to load before scraping

How to wait for an element to load before scraping

Updated 2026-06-25 · 5 min read

If you're scraping a page whose content arrives over JavaScript, you have probably read the DOM a moment too early and pulled back an empty string, a null, or a loading spinner instead of the value you wanted. The element is in the markup you see in DevTools, but it is not there yet when your code looks, because the browser is still fetching and rendering it after the initial HTML loads.

The fix is to pause your scraper until the element actually exists in the live DOM, rather than guessing with a fixed sleep. Puppeteer does this with page.waitForSelector, which polls the page and resolves the moment a node matching your selector appears, then you read it. It takes about 30 lines of Node.js with one open-source library.

Key terms

  • page.waitForSelector. A Puppeteer method that blocks until an element matching a CSS selector is in the DOM, with options for a timeout and for requiring the element to be visible.
  • waitUntil. The page.goto option that decides when navigation is considered done; networkidle2 waits until the page has had at most two open connections for 500ms.
  • Visible. Puppeteer counts an element as visible when it has a non-empty bounding box and is not hidden by display:none or visibility:hidden, which is stricter than merely being present in the DOM.

Here is what the script does:

  • Launch headless Chrome with Puppeteer and open the page that renders its content after load.
  • Wait for the target element with page.waitForSelector, passing a timeout and visible: true so a node that exists but is still hidden does not satisfy the wait.
  • Read the element's text out of the live page once the wait resolves.
  • Handle the timeout case so a slow or missing element fails with a clear message instead of an unhandled rejection.

The complete script

js
// wait-for-element.mjs
import puppeteer from 'puppeteer'

const url = 'https://www.scrapethissite.com/pages/ajax-javascript/'
const selector = '.film-title'

const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()

/* networkidle2 lets the first XHR batch settle, but it is not a guarantee the
   element you want has rendered, so still wait for the selector explicitly. */
await page.goto(url, { waitUntil: 'networkidle2' })

try {
  /* Poll the live DOM until a node matching the selector is present AND visible.
     visible: true rejects an element that exists but is display:none or zero-size.
     The default timeout is 30s; 15s is enough for most renders and fails faster. */
  await page.waitForSelector(selector, { visible: true, timeout: 15000 })

  /* The element is now in the DOM. Read every match in one pass. */
  const titles = await page.$$eval(selector, els =>
    els.map(el => el.textContent.trim())
  )

  console.log(`Found ${titles.length} titles`)
  console.log(titles.slice(0, 5))
} catch (err) {
  /* waitForSelector throws a TimeoutError if the node never appears. Treat that
     as "the content did not load", not as a crash, so the run reports cleanly. */
  console.error(`Element "${selector}" did not appear in time: ${err.message}`)
} finally {
  await browser.close()
}
bash
npm install puppeteer
node wait-for-element.mjs

What each step does

Open the page with networkidle2. A page that builds itself from XHR has nothing useful in its first HTML response. Waiting for the network to go quiet means the page has had a chance to fire its data requests before the script moves on, so waitForSelector is not starting against a bare shell. It is a head start, not a finish line, which is why the explicit selector wait still follows.

Wait for the selector, with visible: true. page.waitForSelector(selector, { visible: true }) polls the page on each animation frame and after each DOM mutation, resolving the instant a matching node is present and rendered. Passing visible: true makes the wait reject a node that is in the DOM but display:none or zero-size, which is what catches spinners and pre-render placeholders. The timeout is in milliseconds and defaults to 30000; lowering it to 15000 fails a dead page sooner.

Read the element once the wait resolves. With the node guaranteed present, page.$$eval(selector, ...) runs a function over every match inside the page context and returns plain data. Reading text only after the wait resolves is the whole point: the value is there now, so textContent returns the rendered string rather than an empty node.

Handle the timeout instead of letting it throw. When the element never appears, waitForSelector rejects with a TimeoutError. Wrapping the wait in try/catch turns that into a logged message naming the selector, and the finally closes the browser on both the success and the failure path so a timeout does not leak a Chrome process.

Gotchas

  • A fixed sleep is either too short or too slow.

    • Issue: await new Promise(r => setTimeout(r, 3000)) guesses at the render time. On a fast load it wastes seconds on every page, and on a slow one it still reads the DOM before the element is there.
    • Fix: replace the sleep with page.waitForSelector(selector, { visible: true }), which returns the moment the element renders and no later.
  • The element exists in the DOM but is still hidden.

    • Issue: page.waitForSelector(selector) with no options resolves as soon as the node is attached, even while it sits inside a display:none wrapper or has zero size, so the read that follows gets empty or placeholder text.
    • Fix: pass { visible: true } so the wait holds until the element has a non-empty bounding box and is not hidden.
  • The selector is matched in the wrong frame.

    • Issue: the content lives inside an <iframe>, so page.waitForSelector polls the top document and times out even though the element is on screen.
    • Fix: get the frame with page.frames().find(f => f.url().includes('embed')) and call waitForSelector on the frame. See How to scrape an iframe's contents in Puppeteer.
  • The text is present but not yet populated.

    • Issue: the node renders empty first and its value streams in a tick later, so visible: true is satisfied while textContent is still blank.
    • Fix: wait on the content, not just the node, with page.waitForFunction(sel => document.querySelector(sel)?.textContent.trim().length > 0, {}, selector).
  • A TimeoutError crashes the whole run.

    • Issue: an un-caught waitForSelector rejection on one page aborts a batch loop and leaves the Chrome instance open, so the next pages never run.
    • Fix: wrap the wait in try/catch, log the selector that failed, and close the browser in a finally block as the script does.
  • The wait is correct but the page never loads the data, because it blocked the bot.

    • Issue: the selector never appears, not because of timing but because the site served a challenge page or an empty stub to a headless client, so every wait runs to its full timeout.
    • Fix: confirm the real page rendered before blaming the wait. Log page.url() and a snippet of page.content() on timeout, and if it is a block, address detection first. See How to patch headless Chrome to avoid detection.

Use this when

The value you need is rendered by client-side JavaScript after the initial HTML, so a one-shot read of the DOM returns nothing, and you want the scrape to start the instant the element is ready rather than after a guessed delay.

Skip this when

The element is already in the server's HTML response (a plain fetch and a parser like cheerio is lighter than a browser); the content loads only as you scroll it into view (use a lazy-load or infinite-scroll loop); the data arrives through an XHR you can read directly (intercept that response instead of waiting on the rendered node); or the element appears only after a click or form submit (drive the interaction first, then wait).

Skip the code, just get the data

Simplescraper turns any website into structured data in seconds.