Simplescraper
Skip to content

How to scrape a page that requires scrolling to a specific element

How to scrape a page that requires scrolling to a specific element

Updated 2026-06-24 · 5 min read

If the page you're scraping returns the element you want but it comes back empty, you're probably reading it before the scroll that fills it has happened. The element sits below the fold, its rows or images are placeholders until something scrolls them near the viewport, and your script grabs the value too early and gets a blank string back. This is how lazy-load works on a page that hangs its content off scroll position.

The solution is to move the headless Chrome viewport to the element so the page fires the same scroll and intersection events a person would, wait for the lazy content to arrive, and only then read the text. That triggers the page's own lazy-load and gives you the filled-in element instead of an empty one. It takes about 35 lines of Node.js with one library, Puppeteer.

Key terms

  • waitForSelector. A Puppeteer call that blocks until a node matching the selector is attached to the DOM, or until a timeout trips.
  • page.evaluate. A Puppeteer call that runs a function inside the page's own JavaScript context, so it has access to document and the live DOM.
  • scrollIntoView. A browser DOM method that scrolls the page until a given element is in the viewport, computing the position from the element rather than a fixed pixel offset.
  • IntersectionObserver. A browser API that fires a callback when an element enters or leaves the viewport, the trigger most lazy-load implementations use to fetch content.
  • Below the fold. The part of a page that sits past the initial visible viewport and is only seen after scrolling.

Here is what the script does:

  • Launch headless Chrome with Puppeteer and open the target page.
  • Wait for the element you care about to exist in the DOM with waitForSelector, even if it starts far below the fold.
  • Scroll that exact element into the viewport by running scrollIntoView inside the page with page.evaluate.
  • Wait for the data the scroll triggers, then read the text out of the now-visible element.

The complete script

js
// scroll-to-element.mjs
import puppeteer from 'puppeteer'

const url = 'https://news.ycombinator.com/'
const targetSelector = 'a.morelink' // the "More" link at the very bottom of the list

const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()

await page.goto(url, { waitUntil: 'networkidle2' })

// Wait for the element to exist in the DOM, even though it sits below the fold.
await page.waitForSelector(targetSelector, { timeout: 15000 })

// Scroll the exact element into view from inside the page, then settle the layout.
await page.evaluate((selector) => {
  const el = document.querySelector(selector)
  el.scrollIntoView({ block: 'center', behavior: 'instant' })
}, targetSelector)

// Give lazy content triggered by the scroll a moment to fetch and render.
await new Promise((resolve) => setTimeout(resolve, 1000))

// Read the element now that it is in view and any lazy content has loaded.
const text = await page.$eval(targetSelector, (el) => el.textContent.trim())
console.log(text)

await browser.close()
bash
npm install puppeteer
node scroll-to-element.mjs

What each step does

Launch headless and open the page. puppeteer.launch({ headless: true }) starts a headless Chrome instance, so client-side JavaScript runs and scroll events fire the way they do for a user. waitUntil: 'networkidle2' returns once the page has had two or fewer in-flight requests for 500ms, which clears the initial load before you start scrolling.

Wait for the element to attach. waitForSelector(targetSelector) polls the DOM until the node exists or the timeout trips. It does not require the element to be visible, only present, which is exactly right when the element is rendered below the fold from the start. Set an explicit timeout so a missing element fails fast instead of hanging on the 30-second default.

Scroll the element into view. Running el.scrollIntoView({ block: 'center' }) inside page.evaluate executes in the page's own context, so it moves Chrome's actual viewport. block: 'center' puts the element in the middle of the screen rather than flush against the top edge, which keeps it clear of sticky headers. Use behavior: 'instant' so the scroll completes before the next line runs.

Wait for lazy content, then read. The scroll is what tells the page to fetch the rows, images, or comments tied to that element, and that fetch is asynchronous. A short fixed pause covers most pages; for a precise wait, swap the setTimeout for a second waitForSelector on a child element that only appears after the load. page.$eval then runs in the page and returns the text.

Gotchas

  • $eval runs before the element exists.

    • Issue: Calling page.$eval(selector, ...) straight after goto throws failed to find element matching selector because the node has not been added to the DOM yet.
    • Fix: put await page.waitForSelector(selector) between goto and the read so the script blocks until the node attaches.
  • The element is in the DOM but its content is empty.

    • Issue: the wrapper element exists, so waitForSelector resolves, but its rows or images are placeholders that only fill in on scroll, so you read back an empty string.
    • Fix: scroll the element into view first, then waitForSelector a child that appears only after the lazy load, for example await page.waitForSelector(targetSelector + ' .row-loaded').
  • A fixed scrollTo pixel offset misses the element.

    • Issue: window.scrollTo(0, 3000) guesses a pixel position that breaks whenever the page height changes between runs, leaving the element off-screen so its scroll listener never fires.
    • Fix: target the element by selector with el.scrollIntoView() so the browser computes the correct position regardless of layout shifts.
  • scrollIntoView from Node fails silently.

    • Issue: writing await page.scrollIntoView(...) does not exist on the Puppeteer page object and element.scrollIntoView() cannot run in Node because there is no DOM there.
    • Fix: wrap the call in page.evaluate((sel) => document.querySelector(sel).scrollIntoView(), selector) so it runs inside the browser.
  • One scroll is not enough on stepped infinite-scroll pages.

    • Issue: pages that load content in chunks per scroll need several scroll events, and a single scrollIntoView to an element that is not yet rendered finds nothing to scroll to.
    • Fix: loop: scroll to the bottom, wait, check whether the target selector has appeared, and repeat until it does or a maximum iteration count is reached.
  • The fixed pause is either too short or wasteful.

    • Issue: a hardcoded setTimeout(1000) flakes on a slow network and wastes time on a fast one, since the real fetch duration varies per page and run.
    • Fix: replace it with page.waitForSelector or page.waitForResponse keyed on the actual lazy request, so the script waits exactly as long as the load takes.

Use this when

A page renders an element below the fold whose content only loads, or only becomes readable, once it scrolls near the viewport, and you need to read that specific element rather than the whole page.

Skip this when

The whole page is already present in the server HTML (use a plain fetch plus a parser); the page loads endlessly on scroll and you want every item (use a scroll-to-bottom loop for infinite scroll); the content appears behind a "Load more" button rather than on scroll (click the button instead); or the data arrives through a background XHR you can call directly (hit that endpoint and skip the browser).

Skip the code, just get the data

Simplescraper turns any website into structured data in seconds.