With the level of digitalization in document workflows taking a steep rise over the last year, an increasing number of companies are seeking a performant and automated solution for optimizing PDFs in high volumes. In this blog we are going to take a look at this challenge and provide a programmatical solution in Java and C# (.NET), and more importantly how it can save you time and resources.
Why optimize PDF?
As you have probably been told numerous times already, data is the new currency. The total big data market is growing exponentially as we can see in this projection graph.
With server providers charging per Gigabyte utilized, the investment increases as you aggregate more and more data. And PDF plays a considerable part in this, since it is the most-used format for published documents.
Since we will see in this article that PDFs can be optimized to use less than half of their original file size, often without any noticeable loss of quality, optimizing PDF becomes a no-brainer.
When do we want it? Now!
In most cases the need for optimization is one focused on reducing file size, i.e., compression. And a smaller file size means increased speed. When you need to exchange or publish files online, waiting a mere second can ruin the user experience. Nobody likes to wait, and your information hungry users are at the front row of that movement.
Archiving is another aspect of your workflow that requires compression in order to have a performant and space-saving archive. Legal requirements often demand a large number of documents to be kept available. A hellish task awaits if the process is not automated.
Optimizing PDF also syncs extremely well with the other steps in your document workflow such as digital signing, data extraction and redaction. As an example, this is how PDF optimization can fit into an OCR document workflow.
Other use-cases for optimization include standardizing heterogenous collections of PDFs from various sources (think files submitted by end-users) and optimizing for print (converting to CYMK for example).
Less doesn’t have to mean a pixelated mess
Compression tends to imply a loss of visual quality, but that isn’t necessarily so. Luckily, we don’t have to resort to methods that leave our PDFs looking like they came straight from an ‘80s arcade machine. Let’s have a look at several optimization options that can leave the visual appearance untouched:
- Intelligent compression of images: different images require different compression and scaling techniques. Some of them can be applied without the user noticing.
- Removing duplicates: remove duplicate instances of embedded fonts and images.
- Font subsetting: removing unused characters of a font.
- Stream compression: binary streams can be compressed without quality loss in the generated files.
- Compress attachments
The author exploring his hobby. While you can’t spot any difference in image quality, the right image has a file size of 560KB compared to the original 1.9MB on the left.
Arial has 51,180 glyphs (38,911 characters), supports 32 code pages, and contains Latin and Han Ideographic OpenType layout tables. With Font subsetting we can choose to only embed the characters we need. Since we only use Arial for our headings, in this case we only need 14 characters (H, e, a, d, i, n, g, 1, S, u, b, 2, 3,.).
Using iText pdfOptimizer to optimize PDFs
Now we have seen some different applications that require PDF optimization and different options that retain the visual integrity of your PDF’s. But how can you do this programmatically? And how can you manage these different applications with ease?
iText’s add-on pdfOptimizer provides all the functionality you need. Below is a quick example showing how you can select a profile to optimize PDFs with lossless compression. You can easily use different profiles or customize to the smallest detail.
Want to try this for yourself? Check out pdfOptimizer in the iText Demo Lab and see the effect of various optimization strategies on your own PDFs. The fully-fledged add-on also includes the ability to define custom optimization profiles and detailed optimization reports so you can keep track of the results of your optimizations.
Getting started
Ready to optimize your PDFs? iText pdfOptimizer is available as an add-on for iText 7 Core. You can try pdfOptimizer as part of the iText 7 Suite 30-day trial.
Looking to further enhance your digital document workflow? iText has you covered in all aspects: digitally signing PDFs, automating PDF generation, automating PDF processing, PDF manipulation, data-driven PDF templates, converting PDFs to images, OCR scanned files and images into searchable PDFs, creating PDF from HTML, accurate rendering of multiple languages and scripts, redacting PDF content etc.