How to scrape a page to a full-page screenshot in Puppeteer

Updated 2026-06-24 · 5 min read

If you've taken a full-page screenshot of an image-heavy site and gotten back a tall band of grey blocks where the images should be, you've hit the lazy-loading trap. Most sites attach loading="lazy" or an IntersectionObserver to images below the fold and only fetch them once they near the viewport. A headless browser does not scroll on its own, so those images stay unrequested, and the capture freezes them as empty placeholders.

The fix is to do the scrolling yourself before the capture. We'll build a small script that opens the page at a fixed viewport width so the layout is deterministic across runs, waits for the network to settle so late-loading content is present, then steps down the whole page so the lazy images start decoding and scrolls back to the top so the shot starts where the document does, waits for the in-flight images to finish, and captures the whole scroll height to a PNG on disk. It runs about 40 lines of Node.js with one library, Puppeteer.

The complete script

// full-page-screenshot.mjs
import puppeteer from 'puppeteer'

const url = 'https://en.wikipedia.org/wiki/Web_scraping'

const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()

// a fixed viewport width makes the capture deterministic. fullPage overrides the
// height, so only the width and the device scale factor matter here.
await page.setViewport({ width: 1280, height: 800, deviceScaleFactor: 2 })

// networkidle2 resolves when at most 2 connections are open for 500ms, which is
// a good proxy for "the page has finished its initial data fetches".
await page.goto(url, { waitUntil: 'networkidle2', timeout: 60000 })

// lazy images load on scroll, so a top-of-page screenshot leaves them blank.
// scroll the whole height in steps, pausing so each batch can decode, then
// return to the top so the capture starts where the page does.
await page.evaluate(async () => {
  const step = 400
  for (let y = 0; y < document.body.scrollHeight; y += step) {
    window.scrollTo(0, y)
    await new Promise(resolve => setTimeout(resolve, 150))
  }

  window.scrollTo(0, 0)
})

// give any images that started loading during the scroll a moment to finish.
await page.evaluate(() => Promise.all(
  Array.from(document.images)
    .filter(img => !img.complete)
    .map(img => new Promise(resolve => { img.onload = img.onerror = resolve }))
))

await page.screenshot({
  path: 'screenshot.png',
  fullPage: true
})

await browser.close()
console.log('Saved screenshot.png')

bash

npm install puppeteer
node full-page-screenshot.mjs

How it works

Launch headless and set the viewport width. puppeteer.launch({ headless: true }) runs Chrome without a window. setViewport pins the width to 1280, which fixes which responsive breakpoint the site renders at. deviceScaleFactor: 2 doubles the pixel density so the PNG is retina-sharp; drop it to 1 to halve the file size. In a Linux container or CI runner, launch often fails with a sandbox error because Chrome's system dependencies are missing or it cannot create a sandbox, so install the dependencies Puppeteer lists for your base image and launch with args: ['--no-sandbox', '--disable-setuid-sandbox'] in a trusted environment.

Wait for the network with networkidle2. waitUntil: 'networkidle2' holds goto until there have been no more than two open connections for 500 milliseconds. That is a better signal than 'load' for content that arrives over XHR after the initial document. The 60-second timeout is a ceiling, not a target. One thing networkidle2 can resolve before is a late @font-face file, so text renders in a fallback face like Times New Roman instead of the brand font; wait for fonts explicitly with await page.evaluate(() => document.fonts.ready) before capturing.

Scroll the full height, then return to the top. The first page.evaluate walks window.scrollTo down the document in steps and pauses 150 milliseconds between each so lazy images can begin loading. Scrolling back to (0, 0) matters because some sticky or absolutely-positioned elements anchor to the current scroll position. For stubborn sites whose images still render as blank grey blocks, set img.loading = 'eager' on every image inside a page.evaluate before scrolling.

Wait for in-flight images, then capture. The second page.evaluate resolves once every document.images entry that is not already complete has fired onload or onerror, so the screenshot does not race a half-decoded image. page.screenshot({ fullPage: true, path }) writes the stitched PNG. Three things bite at capture time with fullPage: true: an element styled position: fixed or position: sticky paints at several scroll offsets and stamps the same nav bar down the image, which you fix by neutralizing it with page.addStyleTag({ content: '*{position: static !important}' }); a spinner, fade-in, or auto-rotating carousel freezes at a random frame so repeat runs differ, which you fix by disabling motion with page.addStyleTag({ content: '*{animation: none !important; transition: none !important}' }); and Chrome silently truncates a single screenshot near 16,384 pixels tall at the GPU texture limit, so a very long page needs capturing in viewport-height slices stitched with sharp or shooting specific element handles with element.screenshot(). A full-page PNG at deviceScaleFactor: 2 can also run to tens of megabytes, so capture JPEG with { type: 'jpeg', quality: 80 } for photographs or post-process with sharp to resize and convert to WebP.

Use this when

You want a pixel-accurate image of a full page for visual regression testing, archiving, link-preview cards, change monitoring, or PDF-style records of a live URL. One image per page, including everything below the fold.

Skip this when

You only need the text or article body (convert to Markdown instead); you need just one component rather than the whole page (use element.screenshot() on a handle); the page is taller than Chrome's screenshot limit (slice and stitch, or render to PDF); or you are capturing thousands of URLs and want a hosted endpoint rather than running a browser pool yourself.

How to scrape a page to a full-page screenshot in Puppeteer ​

The complete script ​

How it works ​

Related guides ​

Skip the code, just get the data Simplescraper turns any website into structured data in seconds.

How to scrape a page to a full-page screenshot in Puppeteer

The complete script

How it works

Related guides

Skip the code, just get the data
Simplescraper turns any website into structured data in seconds.