How to scrape data from a Shadow DOM

Updated 2026-06-25 · 6 min read

If you've pointed a selector at a page and got nothing back even though the value is sitting right there in the rendered DOM, the data is probably inside a shadow root. A web component attaches its own isolated DOM tree to a host element, and a plain document.querySelector stops at that boundary: the markup you see in DevTools under a #shadow-root (open) node is invisible to ordinary CSS selectors.

The solution is to use Puppeteer's shadow-aware selectors so the query descends through the shadow boundary instead of stopping at it. We'll build a small script that loads a page rendering its content inside an open shadow root, pulls text out with a deep combinator that crosses the boundary plain CSS stops at, reads the same fields with a flatter selector that grabs every match in one call, and drops into the live page to walk the shadow root by hand for the nested roots and attribute reads the selectors do not cover. It takes about 35 lines of Node.js and one open-source library.

The complete script

// scrape-shadow-dom.mjs
import puppeteer from 'puppeteer'

/*
  this demo builds its own open shadow DOM with setContent so the script is
  deterministic and runs anywhere. to scrape a live site, delete the markup
  and the setContent call and use: await page.goto('https://example.com').
  the selectors below are unchanged against any open shadow root.
*/
const markup = `
  <product-card>
    <template shadowrootmode="open">
      <h2 class="name">Wireless Headphones</h2>
      <span class="price" data-cents="8999">$89.99</span>
    </template>
  </product-card>`

const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()

// setContent parses declarative shadow roots, so the <template> becomes a real open shadow root.
await page.setContent(markup)

// 1. deep combinator: descend from the host into its open shadow root at any depth.
const name = await page.$eval('product-card >>> .name', el => el.textContent.trim())

// 2. pierce selector: match across every open shadow root in the document in one call.
const price = await page.$eval('pierce/.price', el => el.textContent.trim())

// 3. manual walk: read shadowRoot by hand when you need an attribute or a nested root.
const cents = await page.evaluate(() => {
  const host = document.querySelector('product-card')
  // host.shadowRoot is the ShadowRoot for an open host, or null for a closed one.
  return host.shadowRoot.querySelector('.price').getAttribute('data-cents')
})

console.log({ name, price, cents })

await browser.close()

bash

npm install puppeteer
node scrape-shadow-dom.mjs

How it works

Load the page with an open shadow root. Puppeteer drives headless Chromium, which renders the same shadow DOM a browser does. The demo uses page.setContent with a <template shadowrootmode="open"> so it has a real open shadow root without depending on a third-party site; on a live target you swap in page.goto(url) and the selectors below stay the same. Custom elements attach their shadow root after the initial HTML parses, so a query fired too early sees a bare host with no shadow content; wait for the inner node with await page.waitForSelector('product-card >>> .name') before reading. And if DevTools shows #shadow-root (closed) rather than open, the component was created with attachShadow({ mode: 'closed' }), host.shadowRoot is null, and neither >>> nor pierce/ can enter it - look for the same data in a JSON blob the page already fetched or an XHR response, or override Element.prototype.attachShadow before the component initializes to force open roots.

Read text with the deep combinator. product-card >>> .name selects .name at any depth inside the host's open shadow root. The >>> is Puppeteer's deep descendant combinator; a plain product-card .name returns nothing because standard CSS does not descend into a shadow tree. Use >>>> when you want only the host's immediate shadow root and not roots nested deeper. One miss to watch for: text passed into a component as children and projected through a <slot> lives in the host's light DOM, not the shadow tree, so product-card >>> .label skips it - query the slotted node directly with product-card .label.

Read the same field with a pierce selector. pierce/.price matches every .price across all open shadow roots in one query, with no host prefix. It is the flatter option when the field is unique on the page; the deep combinator is the one to reach for when you need to anchor the match under a specific host. On a listing with many cards, pierce/ mixes results across every root with no grouping, so iterate the hosts first and scope each read, for example for (const card of await page.$$('product-card')) { await card.$eval(':scope >>> .price', el => el.textContent) }, keeping each value tied to its card.

Walk shadowRoot by hand for attributes and nested roots. Inside page.evaluate you have the live DOM, so host.shadowRoot.querySelector('.price').getAttribute('data-cents') reads the data-cents attribute the text selectors skip. Reaching for host.shadowRoot in the Node.js scope instead throws, because there the host is a Puppeteer ElementHandle and the property lives on the browser-side DOM object, which is why the read has to happen inside the callback. This manual path is also how you reach a shadow root nested inside another shadow root, by chaining .shadowRoot once per level.

Use this when

You are scraping a site built from web components (Polymer, Lit, Stencil, or hand-rolled custom elements) and the values you want render under an open #shadow-root, where a normal selector returns nothing.

Skip this when

The shadow root is closed (look for the data in an XHR response or an inline JSON payload instead); the content is plain light-DOM markup (use ordinary CSS selectors); the page never runs JavaScript (a static fetch plus a parser is lighter than a browser); or you only need the article body as text (run a readability pass).

How to scrape data from a Shadow DOM ​

The complete script ​

How it works ​

Related guides ​

Skip the code, just get the data Simplescraper turns any website into structured data in seconds.

How to scrape data from a Shadow DOM

The complete script

How it works

Related guides

Skip the code, just get the data
Simplescraper turns any website into structured data in seconds.