How to scrape an iframe's contents in Puppeteer

Updated 2026-06-18 · 6 min read

If you're scraping a page where the content you want sits inside an iframe, you're probably watching your selectors come back empty even though you can see the text right there in the browser. An iframe is a separate browsing context with its own document, so the parent page's document.querySelector stops at the <iframe> tag and does not reach what's underneath. It gets more complicated when the frame is cross-origin or sandboxed.

The solution is to get a handle to the right frame object first and run your extraction inside it, so the framed document's own DOM is what your code reads. We'll build a small script that launches Chromium and opens a page that embeds an iframe, gets a handle to the <iframe> element and crosses into the frame's own document so its DOM is in reach, waits for a known selector inside the frame so extraction runs only once the framed content has loaded, reads the text and attributes it wants back as plain JavaScript values, and falls back to walking every frame on the page and matching on URL for when you have no element handle to start from. Puppeteer models each iframe as a Frame with the same evaluate and waitForSelector methods as a page, so once you cross into it the work is ordinary scraping. It comes out to about 40 lines of Node.js with one dependency, the puppeteer package.

The complete script

// scrape-iframe.mjs
import puppeteer from 'puppeteer'

const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()

// MDN's iframe reference page embeds a live demo in an <iframe>.
await page.goto('https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe', {
  waitUntil: 'networkidle2'
})

// 1. get a handle to the <iframe> element in the parent document.
//    the frame may render after navigation, so wait for the element itself.
const frameElement = await page.waitForSelector('iframe.sample-code-frame', { timeout: 15000 })

// 2. cross from the <iframe> element into its own document.
//    contentFrame() returns a Frame, which has the same query/evaluate API as a Page.
const frame = await frameElement.contentFrame()
if (!frame) throw new Error('iframe has no attached content frame yet')

// 3. wait for content *inside the frame*. the parent being idle does not mean
//    the framed document has painted. wait on a selector the frame owns.
await frame.waitForSelector('body', { timeout: 15000 })

// 4. extract inside the frame. this runs in the frame's context, so its own
//    document.querySelector sees the framed DOM, not the parent's.
const data = await frame.evaluate(() => {
  const links = [...document.querySelectorAll('a')].map(a => ({
    text: a.textContent.trim(),
    href: a.href
  }))
  return {
    title: document.title,
    bodyText: document.body.innerText.slice(0, 200),
    linkCount: links.length,
    firstLinks: links.slice(0, 3)
  }
})

console.log(JSON.stringify(data, null, 2))

await browser.close()

bash

npm install puppeteer
node scrape-iframe.mjs

How it works

Get the iframe element, then contentFrame(). page.waitForSelector('iframe...') returns an ElementHandle for the <iframe> tag in the parent. Calling frameElement.contentFrame() on that handle returns the Frame for the document inside. This is the cleanest path when you can target the iframe with a CSS selector. It returns null when no attached frame was found for that element yet, so guard it. Reading the same iframe from the parent script with page.evaluate(() => iframe.contentDocument) fails on the same-origin policy when the frame is cross-origin, but the Frame you get here talks to its context directly over the DevTools Protocol, so a different origin is not a problem this way.

Wait for a selector the frame owns. Puppeteer drives Chromium over the DevTools Protocol, and frame attachment is asynchronous. The parent reaching networkidle2 says nothing about whether the framed document has parsed. Calling frame.waitForSelector('body') (or a more specific element you plan to read) blocks until that node exists inside the frame, so your extraction is not racing the load. A frame that loads on scroll or click has not even started its request when the parent goes idle, so trigger the interaction first with page.click or page.evaluate(() => window.scrollTo(...)), then wait for the frame.

Extract with frame.evaluate(). The callback runs inside the frame's JavaScript context. Its document is the framed document, so document.querySelectorAll('a') returns the frame's links, not the parent's. Whatever you return must be serializable, since it crosses the protocol boundary back to Node as JSON. DOM nodes do not survive the trip; return strings, numbers, and plain objects. If the frame navigates or the parent removes it while the evaluate is running you get Error: Execution context was destroyed or Attempted to use detached Frame, so after any action that could reload the frame, re-fetch it from page.frames() and wrap the evaluate in a try/catch that retries once on detachment.

The page.frames() fallback. When you cannot select the <iframe> element, for example an ad slot with no stable class, walk every frame on the page and match on URL. page.frames() returns a flat array of all frames including nested ones, so page.frames().find(f => f.url().includes('embed')) finds a frame by its src without ever touching the parent DOM. Many iframes inject after the parent's load event, so the frame may not be in that list yet; poll for it with await page.waitForFrame(f => f.url().includes('embed'), { timeout: 15000 }), or on older Puppeteer loop on page.frames() with a short delay until the match shows up. Because the array is flat, the parent/child structure is not represented by nesting, so recover the hierarchy with frame.parentFrame() and frame.childFrames() when you need to reach an iframe inside an iframe. A srcdoc frame has a document but its URL is about:srcdoc, so match it on that URL rather than its src. And since .find() returns only the first match, a page that embeds the same widget twice needs page.frames().filter(f => f.url().includes('widget')) indexed to the one you want, or a check on frame.name() to disambiguate.

Use this when

You need data that lives inside an embedded document - a third-party comment widget, an embedded form, a payment or video frame, a live code demo, or an ad slot - and the parent page's selectors return nothing because the content is in a separate browsing context.

Skip this when

The content you want is actually in the main document and you only thought it was framed (check with page.frames().length first); the data arrives over an XHR or fetch the frame makes (intercept the request instead of reading the rendered DOM); the page has no JavaScript-injected frames and a plain HTTP fetch plus an HTML parser would do; or sandbox flags prevent the frame-side script you need, in which case Puppeteer cannot extract that data with this pattern.

How to scrape an iframe's contents in Puppeteer ​

The complete script ​

How it works ​

Related guides ​

Skip the code, just get the data Simplescraper turns any website into structured data in seconds.

How to scrape an iframe's contents in Puppeteer

The complete script

How it works

Related guides

Skip the code, just get the data
Simplescraper turns any website into structured data in seconds.