On 22 July 2020, a team of security researchers investigating issues relating to PDF encryption and digital signatures announced a novel series of vulnerabilities which they coined the PDF Shadow Attacks. We invited Michael Klink, independent PDF expert and top StackOverflow contributor @mkl, on board as a technical consultant to take a closer look at the PDF Shadow Attacks. As a result of this collaboration, we have already presented an overview of the three variants of Shadow Attacks, and a method to detect potentially malicious hidden content in a PDF.
In this final part, we’ll put our detective hats on straight away. This time, we’ll be looking for PDF content that’s ready to be replaced by harmless looking PDF additions. Last but not least, we’ll take a closer look at the hide-and-replace breed of attack, which has great potential for abuse, and requires us to dig deep into the internal structure of a PDF. The heuristics we’ll build can be used as a starting point for hardening your application.
Detecting Preparations for “Replace” Shadow Attacks
In the case of the replace attack we need to look for content that might change as a side effect of some later addition to the document, after signing. The researchers use an example that relies on text form fields where the value that is displayed is different from the one that is internally associated with the field.
In order to check a PDF for form fields that have been rigged in this way, we’d have to apply text extraction to the form field appearances and compare those results with the internally stored values. Collecting the values of a form is possible by iterating over the form fields, and extracting the text from the appearance Xobject
:
(Code from WidgetAnalyzer.java and WidgetAnalyzer.cs for .NET)
We use the following helper class FieldValues
for storing both the actual (underlying) value as well as the value that was gleaned from the visual appearance:
Using the FieldValues<String> values
information for a single field allows us to determine whether or not a field looks suspicious:
False positives and other potential pitfalls
It’s probably wise to avoid naïve strict equality checking between the underlying form field value, and the value gleaned from the widget’s appearance if you don’t want to get too many false positives. Instead, it’s better to normalize both by trimming whitespace and line breaks before comparing values.
If the values are different even after you apply such normalizations, the document should be scrutinized further before signing. In addition to a potential replace Shadow Attack lurking in the, well, shadows, you also need to consider how the document might be misinterpreted later on.
As an example, an automatized form field extraction process will usually rely on the value that was associated internally with the form field, and won’t take the trouble of extracting the value that is displayed to the user, even though that would be the value that was digitally signed.
Before signing interactively, it’s always a good idea to extract all these values and display them to the user, so there can be no confusion or misunderstanding as to what exactly is being signed.
Detecting Preparations for “Hide and Replace” Shadow Attacks
Lastly, we’ll talk about building some defenses against the hide-and-replace attacks. The researchers themselves cite this variant of the Shadow Attacks as having the greatest potential for abuse since it has the potential to change the content of an entire document!
Detecting a hide-and-replace attack requires a more low-level analysis of the PDF document. Technically speaking, at the level of the PDF’s internal structure, we’d have to look for multiple indirect objects with the same object number inside of the same document, in particular if they would occur in the same revision. The researchers provide the example where a single PDF document contains two different versions of an object containing the root of the Pages tree, giving an attacker the opportunity to surreptitiously switch them around once the document is signed.
By default, iText only provides access to a single object per object number which is the one referenced from the document cross references. If there are different references for that object number in different cross reference tables, then the one from the newest cross reference table is used. This means we’ll have to do some low-level coding here.
There is one use case, though, in which iText does not rely on the cross references but actually scans the file for the indirect objects themselves: The re-building of cross reference information for damaged PDFs! For a method to find all indirect objects in a PDF, even those without references from the cross-reference tables, we can let ourselves be inspired by the PdfReader
method rebuildXref
, for example like this:
(Code from (ObjectStructureAnalyzer.java and ObjectStructureAnalyzer.cs for .NET)
Note: In the Java example we’ve used the Google Guava Multimap
here instead of juggling with maps of sets etc. ourselves.
ExtPdfIndirectReference
is a simple helper class:
(Code fromExtPdfIndirectReference.java and ExtPdfIndirectReference.cs for .NET)
Using these classes one can determine suspicious extra objects like this:
Particular PDF processors + preliminary object versions = potential false positives
Beware, this test can also return false positives since there are certain PDF processors which leave preliminary versions of objects in PDFs. This is unfortunate as these preliminary object versions can also be recognized as suspicious here.
On the other hand, there is some potential for such preliminary object versions to be abused by attackers, e.g. applying Shadow Attack-like techniques to signed PDFs they were unable to prepare beforehand. So, it is probably beneficial to be alerted that these objects exist if you’re looking for dubious content inside a document. Forewarned is forearmed after all.
Conclusion
With that, we wrap up our investigations into the PDF Shadow Attacks. Hopefully, the provided code examples will prove useful for protecting your documents, applications, and workflows against these types of attack. If they do, we’d love it if you let us know!
Once again, we’d like to thank Michael Klink for agreeing to work with us as a technical consultant, and for confirming our findings that iText was not affected by the published attacks. We also thank him for collaborating on this series of articles and providing both Java and C# versions of his code examples.
Michael has been on Stack Overflow for the past 8 years, and in that time has proven to one of the most valued members of the iText community; answering various PDF, iText and digital signature-related questions posed by users.
If you’re interested in learning more about PDF security with iText 7, including an overview of encryption, redaction and digital signatures, we recommend watching the Encryption and Digital Signatures webinar we presented along with the PDF Association earlier this year. This webinar details how to protect PDFs with encryption and digital signatures using iText 7 Core, and also covers achieving secure content redaction with the pdfSweep add-on. Alternatively, see our recent blog on the top three ways to improve your PDF document security.
You can also check out our Digital Signatures solutions page for code examples, use cases and other resources.