How to scrape a page that requires scrolling to a specific element
If the page you're scraping returns the element you want but it comes back empty, you're probably reading it before the scroll that fills it has happened. The element sits below the fold, its rows or images are placeholders until something scrolls them near the viewport, and your script grabs the value too early and gets a blank string back. This is how lazy-load works on a page that hangs its content off scroll position.
The solution is to move the headless Chrome viewport to the element so the page fires the same scroll and intersection events a person would, wait for the lazy content to arrive, and only then read the text. That triggers the page's own lazy-load and gives you the filled-in element instead of an empty one. It takes about 35 lines of Node.js with one library, Puppeteer.
Key terms
waitForSelector. A Puppeteer call that blocks until a node matching the selector is attached to the DOM, or until a timeout trips.page.evaluate. A Puppeteer call that runs a function inside the page's own JavaScript context, so it has access todocumentand the live DOM.scrollIntoView. A browser DOM method that scrolls the page until a given element is in the viewport, computing the position from the element rather than a fixed pixel offset.IntersectionObserver. A browser API that fires a callback when an element enters or leaves the viewport, the trigger most lazy-load implementations use to fetch content.- Below the fold. The part of a page that sits past the initial visible viewport and is only seen after scrolling.
Here is what the script does:
- Launch headless Chrome with Puppeteer and open the target page.
- Wait for the element you care about to exist in the DOM with
waitForSelector, even if it starts far below the fold. - Scroll that exact element into the viewport by running
scrollIntoViewinside the page withpage.evaluate. - Wait for the data the scroll triggers, then read the text out of the now-visible element.
The complete script
// scroll-to-element.mjs
import puppeteer from 'puppeteer'
const url = 'https://news.ycombinator.com/'
const targetSelector = 'a.morelink' // the "More" link at the very bottom of the list
const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()
await page.goto(url, { waitUntil: 'networkidle2' })
// Wait for the element to exist in the DOM, even though it sits below the fold.
await page.waitForSelector(targetSelector, { timeout: 15000 })
// Scroll the exact element into view from inside the page, then settle the layout.
await page.evaluate((selector) => {
const el = document.querySelector(selector)
el.scrollIntoView({ block: 'center', behavior: 'instant' })
}, targetSelector)
// Give lazy content triggered by the scroll a moment to fetch and render.
await new Promise((resolve) => setTimeout(resolve, 1000))
// Read the element now that it is in view and any lazy content has loaded.
const text = await page.$eval(targetSelector, (el) => el.textContent.trim())
console.log(text)
await browser.close()npm install puppeteer
node scroll-to-element.mjsWhat each step does
Launch headless and open the page. puppeteer.launch({ headless: true }) starts a headless Chrome instance, so client-side JavaScript runs and scroll events fire the way they do for a user. waitUntil: 'networkidle2' returns once the page has had two or fewer in-flight requests for 500ms, which clears the initial load before you start scrolling.
Wait for the element to attach. waitForSelector(targetSelector) polls the DOM until the node exists or the timeout trips. It does not require the element to be visible, only present, which is exactly right when the element is rendered below the fold from the start. Set an explicit timeout so a missing element fails fast instead of hanging on the 30-second default.
Scroll the element into view. Running el.scrollIntoView({ block: 'center' }) inside page.evaluate executes in the page's own context, so it moves Chrome's actual viewport. block: 'center' puts the element in the middle of the screen rather than flush against the top edge, which keeps it clear of sticky headers. Use behavior: 'instant' so the scroll completes before the next line runs.
Wait for lazy content, then read. The scroll is what tells the page to fetch the rows, images, or comments tied to that element, and that fetch is asynchronous. A short fixed pause covers most pages; for a precise wait, swap the setTimeout for a second waitForSelector on a child element that only appears after the load. page.$eval then runs in the page and returns the text.
Gotchas
$evalruns before the element exists.- Issue: Calling
page.$eval(selector, ...)straight aftergotothrowsfailed to find element matching selectorbecause the node has not been added to the DOM yet. - Fix: put
await page.waitForSelector(selector)betweengotoand the read so the script blocks until the node attaches.
- Issue: Calling
The element is in the DOM but its content is empty.
- Issue: the wrapper element exists, so
waitForSelectorresolves, but its rows or images are placeholders that only fill in on scroll, so you read back an empty string. - Fix: scroll the element into view first, then
waitForSelectora child that appears only after the lazy load, for exampleawait page.waitForSelector(targetSelector + ' .row-loaded').
- Issue: the wrapper element exists, so
A fixed
scrollTopixel offset misses the element.- Issue:
window.scrollTo(0, 3000)guesses a pixel position that breaks whenever the page height changes between runs, leaving the element off-screen so its scroll listener never fires. - Fix: target the element by selector with
el.scrollIntoView()so the browser computes the correct position regardless of layout shifts.
- Issue:
scrollIntoViewfrom Node fails silently.- Issue: writing
await page.scrollIntoView(...)does not exist on the Puppeteerpageobject andelement.scrollIntoView()cannot run in Node because there is no DOM there. - Fix: wrap the call in
page.evaluate((sel) => document.querySelector(sel).scrollIntoView(), selector)so it runs inside the browser.
- Issue: writing
One scroll is not enough on stepped infinite-scroll pages.
- Issue: pages that load content in chunks per scroll need several scroll events, and a single
scrollIntoViewto an element that is not yet rendered finds nothing to scroll to. - Fix: loop: scroll to the bottom, wait, check whether the target selector has appeared, and repeat until it does or a maximum iteration count is reached.
- Issue: pages that load content in chunks per scroll need several scroll events, and a single
The fixed pause is either too short or wasteful.
- Issue: a hardcoded
setTimeout(1000)flakes on a slow network and wastes time on a fast one, since the real fetch duration varies per page and run. - Fix: replace it with
page.waitForSelectororpage.waitForResponsekeyed on the actual lazy request, so the script waits exactly as long as the load takes.
- Issue: a hardcoded
Use this when
A page renders an element below the fold whose content only loads, or only becomes readable, once it scrolls near the viewport, and you need to read that specific element rather than the whole page.
Skip this when
The whole page is already present in the server HTML (use a plain fetch plus a parser); the page loads endlessly on scroll and you want every item (use a scroll-to-bottom loop for infinite scroll); the content appears behind a "Load more" button rather than on scroll (click the button instead); or the data arrives through a background XHR you can call directly (hit that endpoint and skip the browser).