Simplescraper
Skip to content

How to scrape a page to a full-page screenshot in Puppeteer

How to scrape a page to a full-page screenshot in Puppeteer

Updated 2026-06-24 · 5 min read

If you've taken a full-page screenshot of an image-heavy site and gotten back a tall band of grey blocks where the images should be, you've hit the lazy-loading trap. Most sites attach loading="lazy" or an IntersectionObserver to images below the fold and only fetch them once they near the viewport. A headless browser does not scroll on its own, so those images stay unrequested, and the capture freezes them as empty placeholders. The fix is to drive the scroll yourself before the capture.

The solution is to do the scrolling yourself before the capture, stepping down the whole page so the lazy images start decoding, then waiting for the in-flight images to finish and scrolling back to the top so the shot starts where the document does. Once those images have loaded you get the full page rendered with its below-the-fold images in place. It runs about 40 lines of Node.js with one library, Puppeteer.

Key terms

  • Lazy loading. A page technique that defers fetching images or content until they near the viewport, which is why an unscrolled headless capture leaves them blank.
  • IntersectionObserver. A browser API that fires a callback when an element enters the viewport, the trigger most lazy-load implementations use to start fetching.
  • networkidle2. A Puppeteer wait condition that resolves once there have been no more than two open connections for 500 milliseconds, used here as a proxy for the page finishing its initial fetches.
  • deviceScaleFactor. The pixel-density multiplier for the capture, where a value of 2 doubles resolution for a retina-sharp image at the cost of file size.
  • fullPage. The page.screenshot option that captures the entire scroll height of the document rather than just the visible viewport.

Here is what the script does:

  • Launch headless Chrome with Puppeteer and open a page at a fixed viewport width, so the layout is deterministic across runs.
  • Wait for the network to settle with waitUntil: 'networkidle2', so late-loading content is present.
  • Scroll the page from top to bottom to trigger the lazy-loaded images and IntersectionObserver callbacks on it, then scroll back to the top.
  • Capture the whole scroll height with page.screenshot({ fullPage: true }) and write the PNG to disk.

The complete script

js
// full-page-screenshot.mjs
import puppeteer from 'puppeteer'

const url = 'https://en.wikipedia.org/wiki/Web_scraping'

const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()

// A fixed viewport width makes the capture deterministic. fullPage overrides the
// height, so only the width and the device scale factor matter here.
await page.setViewport({ width: 1280, height: 800, deviceScaleFactor: 2 })

// networkidle2 resolves when at most 2 connections are open for 500ms, which is
// a good proxy for "the page has finished its initial data fetches".
await page.goto(url, { waitUntil: 'networkidle2', timeout: 60000 })

// Lazy images load on scroll, so a top-of-page screenshot leaves them blank.
// Scroll the whole height in steps, pausing so each batch can decode, then
// return to the top so the capture starts where the page does.
await page.evaluate(async () => {
  const step = 400
  for (let y = 0; y < document.body.scrollHeight; y += step) {
    window.scrollTo(0, y)
    await new Promise(resolve => setTimeout(resolve, 150))
  }

  window.scrollTo(0, 0)
})

// Give any images that started loading during the scroll a moment to finish.
await page.evaluate(() => Promise.all(
  Array.from(document.images)
    .filter(img => !img.complete)
    .map(img => new Promise(resolve => { img.onload = img.onerror = resolve }))
))

await page.screenshot({
  path: 'screenshot.png',
  fullPage: true
})

await browser.close()
console.log('Saved screenshot.png')
bash
npm install puppeteer
node full-page-screenshot.mjs

What each step does

Launch headless and set the viewport width. puppeteer.launch({ headless: true }) runs Chrome without a window. setViewport pins the width to 1280, which fixes which responsive breakpoint the site renders at. deviceScaleFactor: 2 doubles the pixel density so the PNG is retina-sharp; drop it to 1 to halve the file size.

Wait for the network with networkidle2. waitUntil: 'networkidle2' holds goto until there have been no more than two open connections for 500 milliseconds. That is a better signal than 'load' for content that arrives over XHR after the initial document. The 60-second timeout is a ceiling, not a target.

Scroll the full height, then return to the top. The first page.evaluate walks window.scrollTo down the document in steps and pauses 150 milliseconds between each so lazy images can begin loading. Scrolling back to (0, 0) matters because some sticky or absolutely-positioned elements anchor to the current scroll position.

Wait for in-flight images, then capture. The second page.evaluate resolves once every document.images entry that is not already complete has fired onload or onerror, so the screenshot does not race a half-decoded image. page.screenshot({ fullPage: true, path }) writes the stitched PNG.

Gotchas

  • Lazy-loaded images render as blank grey blocks.

    • Issue: A screenshot taken straight after goto shows placeholders for the images below the fold, because loading="lazy" and IntersectionObserver only fetch images that enter the viewport, and a headless browser does not scroll on its own.
    • Fix: scroll the full height before capturing, as the script does, then await the in-flight images. For stubborn sites, also set img.loading = 'eager' on every image inside a page.evaluate before scrolling.
  • A fixed or sticky header repeats down the screenshot.

    • Issue: With fullPage: true, an element styled position: fixed or position: sticky can paint at several scroll offsets, leaving the same nav bar stamped multiple times down the image.
    • Fix: neutralize it before the capture with page.addStyleTag({ content: '*{position: static !important}' }), or target the specific selector rather than every element.
  • The capture is cut off at roughly 16,384 pixels tall.

    • Issue: Chrome caps a single screenshot's height near the GPU texture limit, so a very long page is silently truncated at the bottom with no error.
    • Fix: capture in viewport-height slices and stitch them with sharp, or screenshot specific element handles with element.screenshot() instead of the whole document.
  • Custom web fonts have not loaded, so text renders in a fallback face.

    • Issue: networkidle2 can resolve before a late @font-face file arrives, and the screenshot shows Times New Roman where the brand font should be.
    • Fix: wait for fonts explicitly before capturing: await page.evaluate(() => document.fonts.ready).
  • Animations and carousels are caught mid-transition.

    • Issue: A spinner, fade-in, or auto-rotating carousel is frozen at a random frame, so repeat runs of the same URL produce visibly different images.
    • Fix: disable motion before the shot with page.addStyleTag({ content: '*{animation: none !important; transition: none !important}' }).
  • The PNG is tens of megabytes on a long, image-heavy page.

    • Issue: A full-page PNG at deviceScaleFactor: 2 on a long article can run to many megabytes, which is slow to store and serve.
    • Fix: capture JPEG with { type: 'jpeg', quality: 80 } for photographs, or post-process the PNG with sharp to resize and convert to WebP.
  • Chrome will not start in a Linux container or CI runner.

    • Issue: puppeteer.launch fails with a sandbox or shared-library error because the container is missing Chrome's system dependencies or cannot create a sandbox.
    • Fix: install the dependencies Puppeteer lists for your base image and launch with args: ['--no-sandbox', '--disable-setuid-sandbox'] in a trusted environment.

Use this when

You want a pixel-accurate image of a full page for visual regression testing, archiving, link-preview cards, change monitoring, or PDF-style records of a live URL. One image per page, including everything below the fold.

Skip this when

You only need the text or article body (convert to Markdown instead); you need just one component rather than the whole page (use element.screenshot() on a handle); the page is taller than Chrome's screenshot limit (slice and stitch, or render to PDF); or you are capturing thousands of URLs and want a hosted endpoint rather than running a browser pool yourself.

Skip the code, just get the data

Simplescraper turns any website into structured data in seconds.