April 28, 2026 | 6 min read

How PDF Compression Works

A walkthrough of what happens between "I uploaded my PDF" and "here is your smaller file" - in language anyone can follow.

A PDF Is Really a Bundle of Objects

To understand compression, it helps to know what a PDF actually is. Behind the scenes, a PDF file is a bundle of small objects, each with a number:

  • Page objects describe page size and which other objects belong on that page.
  • Text objects store the words and where they sit.
  • Image objects store every photo, scan, and graphic.
  • Font objects store the typefaces used so the document looks the same on every device.
  • Stream objects are the raw data buckets - this is where most of the file size lives.

Compression touches every category, but mostly the last two: streams and images.

Step 1: Read the PDF and Inventory Everything

The first thing the compressor does is scan the file and build a list of every object inside. This sounds boring, but it is where the first easy wins come from. Almost every PDF contains:

  • Orphaned objects no page actually references.
  • Old image thumbnails that viewers no longer need.
  • Multiple copies of the same font.
  • Edit history left behind by Word or Acrobat.

Throwing this junk out can shrink a PDF by 5-20% before anything "clever" happens.

Step 2: Rewrite the Images

Images are almost always the biggest objects in a PDF. A single 12-megapixel photo embedded at full resolution can be larger than the rest of the document combined. The compressor handles them in two passes:

  1. Downsample. If a photo is 4000 pixels wide but the page only needs 800, the extra pixels are wasted. The compressor shrinks the image to what the page can actually display.
  2. Re-encode. The image is saved again with a more efficient encoder (typically JPEG or JPEG2000) at a lower quality setting. The "quality setting" is the dial we move when you pick Low / Medium / High.

This is why image-heavy PDFs shrink dramatically and text-only PDFs barely shrink at all.

Step 3: Subset the Fonts

If your PDF uses a font that contains 5,000 characters but you only used 80 of them, the other 4,920 are dead weight. Font subsetting keeps only the glyphs your document actually uses. On documents with many embedded fonts, this can save tens of kilobytes per font.

Step 4: Re-Pack the Streams

Every stream object in a PDF can be wrapped in one or more "filters" - think of them as zip-style packers. Common ones:

  • Flate (the same algorithm zip uses) for general-purpose data.
  • DCT for photographs (JPEG).
  • CCITT for black-and-white scans.
  • JBIG2 for ultra-efficient B&W document scans.

The compressor applies the right filter to each stream. Many old PDFs use no filter at all, so simply switching them to Flate produces an instant size drop.

Step 5: Rebuild the Cross-Reference Table

At the end of every PDF is a table that tells viewers exactly where each object lives in the file. After all the rewriting and reordering, this table needs to be rebuilt. The new table is also more compact - older PDFs use a verbose ASCII format, while modern compression uses a compressed cross-reference stream.

Why Text Stays Sharp

This is the question we get most. The reason is simple: text in a PDF is not stored as pixels. It is stored as instructions ("draw the letter H at this position in this font"). Compression touches the font and the layout, but never converts text to an image. So no matter how aggressively you compress, words stay crisp at any zoom level.

The exception: scanned documents. A scan is one big image per page, so the "text" you see is actually pixels. That is why scans benefit massively from compression - and why super-aggressive scan compression can make text look slightly fuzzy.

See It In Action

Click to select or drag & drop your PDF

Max 15 MB
Compressing... 0%

Optimizing images and document structure...