PDF Compression Techniques Explained
Every major technique a modern PDF compressor uses, what kind of file each one helps, and roughly how much size it can save. A practical reference, not a textbook.
1. Object Cleanup
What it does: Removes objects that no page actually references - old thumbnails, leftover edit data, duplicate fonts, broken bookmarks.
Best for: Any PDF that has been edited or re-saved multiple times in Word, Acrobat, or InDesign.
Typical savings: 5-20%. Sometimes much more on heavily edited documents.
Quality cost: None. This is purely garbage collection.
2. Stream Re-Filtering (Flate / Deflate)
What it does: Wraps every text and data stream in the Flate filter (the same algorithm used by zip and PNG). Many older PDFs leave streams uncompressed entirely.
Best for: Older PDFs, PDFs generated by simple printer drivers, and anything created before 2010.
Typical savings: 10-40% on uncompressed input.
Quality cost: None. Flate is lossless.
3. Image Downsampling
What it does: Reduces image resolution to match how the image is actually used on the page. A 4000-pixel-wide photo placed in a 600-pixel-wide column gets resampled down to ~600 pixels.
Best for: PDFs exported from cameras, phones, or design tools where every photo is at full resolution.
Typical savings: 50-90% per image.
Quality cost: Invisible at normal screen zoom. Visible only when zooming far past 100%.
4. JPEG (DCT) Re-Encoding for Photos
What it does: Photos and continuous-tone images are re-encoded as JPEG with an adjustable quality setting (typically 60-85 for general use).
Best for: Brochures, portfolios, scanned color documents.
Typical savings: 30-70% per photo.
Quality cost: Slight, mostly in subtle gradients. Most viewers will not notice.
5. JPEG2000 for High-Quality Images
What it does: A more advanced image codec that produces smaller files than JPEG at the same visual quality.
Best for: High-end print documents, art portfolios where quality matters.
Typical savings: 20-30% better than JPEG at the same quality target.
Quality cost: Lower than JPEG. Downsides are slower decode and weaker viewer support on very old PDF readers.
6. CCITT Group 4 for Black-and-White Scans
What it does: A lossless encoding designed specifically for 1-bit black-and-white images. Originally invented for fax machines.
Best for: Pure B&W scanned forms, contracts, line drawings.
Typical savings: Massive - often 95%+ vs storing the same image as raw bitmap.
Quality cost: None. Lossless on 1-bit images.
7. JBIG2 for B&W Document Scans
What it does: The most aggressive method for black-and-white scans. JBIG2 detects repeating shapes (like the letter "e") and stores each one only once.
Best for: Long scanned books, archives, government records.
Typical savings: 2-5x smaller than CCITT.
Quality cost: If used in lossy mode, similar-looking characters can be confused. Use lossless JBIG2 for legal documents.
8. Font Subsetting
What it does: Embeds only the glyphs your document actually uses. A font that ships with 5,000 glyphs typically gets pruned to 50-200 in real documents.
Best for: Documents with many embedded fonts - design exports, multilingual reports.
Typical savings: 20-80% per font.
Quality cost: None for typical use. Editing the PDF later may show "missing glyph" boxes if you add characters not in the subset.
9. Object Stream Compression
What it does: Bundles many small objects into a single compressed stream rather than storing each separately. Available in PDF 1.5+.
Best for: Documents with thousands of small objects - tagged PDFs, accessibility-rich documents, complex forms.
Typical savings: 5-15%.
Quality cost: None.
10. Cross-Reference Stream Compression
What it does: Replaces the verbose ASCII cross-reference table at the end of older PDFs with a compact compressed stream.
Best for: Any PDF still using the legacy table format.
Typical savings: 1-5%. Small but free.
Quality cost: None.
Quick Reference: Which Technique for Which PDF
| Document type | Highest-impact technique |
|---|---|
| Word/Pages export with photos | Image downsampling + JPEG re-encoding |
| Color scan | Downsampling + JPEG re-encoding |
| Black-and-white scan | CCITT or JBIG2 |
| Text-only document with custom fonts | Font subsetting |
| Old PDF (pre-2010) | Stream re-filtering + object cleanup |
| PDF that has been re-saved many times | Object cleanup |
| Tagged or accessible PDF | Object stream compression |