Investigating PDF Shadow Attacks: In-Depth PDF Security using iText (Part 2)

PDF Shadow Attacks

The RUB researchers previously published their findings for other types of attacks back in early 2019[1]. This resulted in the hardening of many PDF signature validators, e.g. Adobe Acrobat Reader and iText, against incorrectly reporting manipulated signed PDFs as validly signed. This older research focused on means to manipulate signed PDFs once an attacker had already got their hands on the document.

The Shadow Attacks published in July focus on a different scenario; here the attackers already have a hand in the creation of the PDF before signing and can therefore plant invisible content in there. After signing, they then attempt to make this previously hidden content visible without causing the validators to throw any warnings, since they are misled by the fact that this content was actually already present in the originally signed document. These after-signing manipulations all are applied as incremental updates, i.e. by adding objects or changing references to objects only in data appended to the signed PDF.

Safe update?

Let’s start with the good news: iText does not fall victim to this class of attacks. As explained in the previous post in this series, all of the Shadow Attacks are based on incremental updates to the signed document, either by adding objects, or by changing references that don’t look suspicious to validators but do cause a change of the visible document content.

Let’s start with the good news: iText does not fall victim to this class of attacks.

The methods in iText 7 related to checking incremental updates, however, do not attempt to distinguish between innocent and suspicious changes at all. Instead, iText 7 conservatively considers all incremental updates to a document as changes. The notion of a safe incremental update does not exist in iText. Simply put, either the document is changed after the fact or it isn’t. There is no middle ground.

Two methods in particular from the SignatureUtil class merit some closer examination. When iText 7 is presented with manipulated sample PDFs from the PDF Insecurity website, it’s able to correctly identify that the documents were changed in some way after signing. If we call SignatureUtil.signatureCoversWholeDocument on those documents, iText returns false, meaning some objects in the document are not part of the signature.

The iText 7 SignatureUtil class also has a method to retrieve the revision of the document that was signed. If we call extractRevision on the manipulated PDFs, we don’t get back the latest document revision, but instead we get the document revision of when the documents were originally signed, just like we would expect.

Limited vulnerability?

The RUB researchers, however, go as far as to state that not distinguishing between safe and unsafe modifications in and of itself constitutes a limited vulnerability. They argue on page 15 of the report[2] that if “the same warning is raised in case of an allowed modification (e.g., commenting) as well as in case of unallowed modifications (attacks)” then “[…] victims are unable to distinguish between both cases”.

This would mean that, according to the researchers, a conservative approach such as the one implemented by iText is also considered a vulnerability to a lesser degree. In our opinion this terminology only makes sense if the software in question promises otherwise. If and only if the software promises to distinguish between safe and unsafe updates but then fails to do so, should this be considered a vulnerability.

iText 7 doesn’t promise to analyze changes introduced by incremental updates to PDFs. The only exception would be for retrieving LTV information, which, depending on the validation model, is only used if it appears in revisions of the document from after it was signed. We consider this to be the expected behavior of iText 7.

Of course, if some software uses iText for signature validation, but additionally promises to recognize allowed and disallowed changes, it would have to include additional measures to distinguish malicious documents from genuine ones. If such software would fail to accurately flag unsafe changes to a document, it would indeed constitute a limited vulnerability to the PDF Shadow Attacks.

Making a security lasagna with iText

PDF software vendors and manufacturers were informed well in advance by the researchers, giving them the opportunity to harden their software before the findings were made public. Adobe Acrobat, for example, is no longer vulnerable as of version 2020.009.20063 (cf. the Adobe Security Bulletin APSB20-24, published May 12th, 2020).

Nonetheless, the Shadow Attacks should not be considered ineffective yet. Software updates are notoriously slow to trickle down to end users’ devices, especially within large organizations or enterprises with strict policies and procedures for rollouts of software. To make matters worse, the researchers have not yet exhausted all available options at their disposal for forging documents!

A malicious user might prepare documents for them to be susceptible to PDF Shadow Attacks. So instead of naively signing every PDF document that comes your way, it would make sense to consider methods of flagging suspicious documents early on. One option would be to harden your application with additional heuristics for detecting and flagging such documents. This fits perfectly with the philosophy of defense in depth, where security controls are layered on top of another. No individual security control in and of itself infallible, but combined strong enough to provide strong security.

What would such heuristics look like? Well, for starters, what better PDF library to build it on than iText 7. In what follows, we’ll create a proof-of-concept for detecting PDF Shadow Attack preparation that could serve as a starting point for detecting malicious documents in your own applications.

Please be aware, though, that these code samples serve only as a guideline, and should not in any way be considered infallible mechanisms for detecting malicious intent. A generalized toolkit for finding clues that point to malicious intent in terms of Shadow Attacks most likely has to consider many more cases.

Detecting Preparations for “Hide” Shadow Attacks

In the case of the hide attack, one has to look for content hidden beneath other content, especially when the latter can somehow be made unreachable for a viewer. The RUB researchers give the obvious example of an image resource obscuring some text. After the document is signed, the overlay is then removed, revealing whatever an attacker might want you to believe was signed initially. Most viewers don’t warn users when content is removed after signing, only when new content was added.

To detect PDFs where text is overlaid by an image, one can simply enhance generic text extraction code a bit. First, the LocationTextExtractionStrategy needs to be enhanced to also be aware of images, and to check for text overlaid by images.

class Strategy extends LocationTextExtractionStrategy { @SuppressWarnings("unchecked") public Strategy(int pageNr) { super(); this.pageNr = pageNr; try { field = LocationTextExtractionStrategy.class.getDeclaredField("locationalResult"); field.setAccessible(true); locationalResult = (List) field.get(this); } catch (NoSuchFieldException | SecurityException | IllegalArgumentException | IllegalAccessException e) { throw new RuntimeException("Failue retrieving LocationTextExtractionStrategy member locationalResult", e); } } @Override public void eventOccurred(IEventData data, EventType type) { if (type == EventType.RENDER_IMAGE) { ImageRenderInfo renderInfo = (ImageRenderInfo) data; Matrix imageCtm = renderInfo.getImageCtm(); AffineTransform inverseCtm = inverse(imageCtm); List notCovered = new ArrayList(locationalResult.size()); for (TextChunk chunk : locationalResult) { Point checkPoint = getCheckPoint(chunk.getLocation()); Point pullback = inverseCtm.transform(checkPoint, null); if (!isInUnitSquare(pullback)) notCovered.add(chunk); } if (notCovered.size() < locationalResult.size()) { locationalResult.removeAll(notCovered); String text = getResultantText(); HiddenText hiddenText = new HiddenText(pageNr, imageCtm, text, renderInfo.getImage()); hiddenTexts.add(hiddenText); locationalResult.clear(); // Or not? locationalResult.addAll(notCovered); } } super.eventOccurred(data, type); } Point getCheckPoint(ITextChunkLocation location) { Vector start = location.getStartLocation(); Vector end = location.getEndLocation(); return new Point((start.get(Vector.I1) + end.get(Vector.I1)) / 2, (start.get(Vector.I2) + end.get(Vector.I2)) / 2); } boolean isInUnitSquare(Point point) { double x = point.getX(); double y = point.getY(); return 0 <= x && x <= 1 && 0 <= y && y <= 1; } AffineTransform inverse(Matrix ctm) { try { AffineTransform t = new AffineTransform( ctm.get(Matrix.I11), ctm.get(Matrix.I12), ctm.get(Matrix.I21), ctm.get(Matrix.I22), ctm.get(Matrix.I31), ctm.get(Matrix.I32) ); return t.createInverse(); } catch (NoninvertibleTransformException e) { return null; } } public List getHiddenTexts() { return hiddenTexts; } final int pageNr; final List hiddenTexts = new ArrayList(); final Field field; final List locationalResult; }

class Strategy : LocationTextExtractionStrategy { public Strategy(int pageNr) { PageNr = pageNr; FieldInfo field = typeof(LocationTextExtractionStrategy).GetField("locationalResult", BindingFlags.NonPublic | BindingFlags.Instance); locationalResult = (List) field.GetValue(this); } public override void EventOccurred(IEventData data, EventType type) { if (type == EventType.RENDER_IMAGE) { ImageRenderInfo renderInfo = (ImageRenderInfo)data; Matrix imageCtm = renderInfo.GetImageCtm(); AffineTransform inverseCtm = Inverse(imageCtm); List notCovered = new List(locationalResult.Count); foreach (TextChunk chunk in locationalResult) { Point checkPoint = GetCheckPoint(chunk.GetLocation()); Point pullback = inverseCtm.Transform(checkPoint, null); if (!IsInUnitSquare(pullback)) notCovered.Add(chunk); } if (notCovered.Count < locationalResult.Count) { locationalResult.RemoveAll(notCovered.Contains); String text = GetResultantText(); HiddenText hiddenText = new HiddenText(PageNr, imageCtm, text, renderInfo.GetImage()); HiddenTexts.Add(hiddenText); locationalResult.Clear(); // Or not? locationalResult.AddRange(notCovered); } } base.EventOccurred(data, type); } Point GetCheckPoint(ITextChunkLocation location) { Vector start = location.GetStartLocation(); Vector end = location.GetEndLocation(); return new Point((start.Get(Vector.I1) + end.Get(Vector.I1)) / 2, (start.Get(Vector.I2) + end.Get(Vector.I2)) / 2); } bool IsInUnitSquare(Point point) { double x = point.GetX(); double y = point.GetY(); return 0 <= x && x <= 1 && 0 <= y && y <= 1; } AffineTransform Inverse(Matrix ctm) { try { AffineTransform t = new AffineTransform( ctm.Get(Matrix.I11), ctm.Get(Matrix.I12), ctm.Get(Matrix.I21), ctm.Get(Matrix.I22), ctm.Get(Matrix.I31), ctm.Get(Matrix.I32) ); return t.CreateInverse(); } catch (NoninvertibleTransformException) { return null; } } public int PageNr { get; private set; } public List HiddenTexts { get; private set; } = new List(); List locationalResult; }

(Code from inner class in ContentAnalyzer.java and inner class Strategy in ContentAnalyzer.cs for .NET.)

Unfortunately locationalResult is a private member of the LocationTextExtractionStrategy so it is not publicly accessible. This can be circumvented by using reflection. If your runtime environment does not allow reflection, consider just duplicating the original iText class in your code base, and manually add the above changes directly.

HiddenText is a simple helper class:

public class HiddenText { public HiddenText(int page, Matrix imageMatrix, String text, PdfXObject xobject) { this.page = page; this.imageMatrix = imageMatrix; this.text = text; this.xobject = xobject; } public int getPage() { return page; } public Matrix getImageMatrix() { return imageMatrix; } public String getText() { return text; } public PdfXObject getXobject() { return xobject; } final int page; final Matrix imageMatrix; final String text; final PdfXObject xobject; }

public class HiddenText { public HiddenText(int page, Matrix imageMatrix, String text, PdfXObject xobject) { Page = page; ImageMatrix = imageMatrix; Text = text; Xobject = xobject; } public int Page { get; private set; } public Matrix ImageMatrix { get; private set; } public String Text { get; private set; } public PdfXObject Xobject { get; private set; } }

(Code from HiddenText.java and HiddenText.cs for .NET.)

Now you can retrieve a list of images covering some text on a given page like this:

class Strategy extends LocationTextExtractionStrategy { @SuppressWarnings("unchecked") public Strategy(int pageNr) { super(); this.pageNr = pageNr; try { field = LocationTextExtractionStrategy.class.getDeclaredField("locationalResult"); field.setAccessible(true); locationalResult = (List) field.get(this); } catch (NoSuchFieldException | SecurityException | IllegalArgumentException | IllegalAccessException e) { throw new RuntimeException("Failue retrieving LocationTextExtractionStrategy member locationalResult", e); } } @Override public void eventOccurred(IEventData data, EventType type) { if (type == EventType.RENDER_IMAGE) { ImageRenderInfo renderInfo = (ImageRenderInfo) data; Matrix imageCtm = renderInfo.getImageCtm(); AffineTransform inverseCtm = inverse(imageCtm); List notCovered = new ArrayList(locationalResult.size()); for (TextChunk chunk : locationalResult) { Point checkPoint = getCheckPoint(chunk.getLocation()); Point pullback = inverseCtm.transform(checkPoint, null); if (!isInUnitSquare(pullback)) notCovered.add(chunk); } if (notCovered.size() < locationalResult.size()) { locationalResult.removeAll(notCovered); String text = getResultantText(); HiddenText hiddenText = new HiddenText(pageNr, imageCtm, text, renderInfo.getImage()); hiddenTexts.add(hiddenText); locationalResult.clear(); // Or not? locationalResult.addAll(notCovered); } } super.eventOccurred(data, type); } Point getCheckPoint(ITextChunkLocation location) { Vector start = location.getStartLocation(); Vector end = location.getEndLocation(); return new Point((start.get(Vector.I1) + end.get(Vector.I1)) / 2, (start.get(Vector.I2) + end.get(Vector.I2)) / 2); } boolean isInUnitSquare(Point point) { double x = point.getX(); double y = point.getY(); return 0 <= x && x <= 1 && 0 <= y && y <= 1; } AffineTransform inverse(Matrix ctm) { try { AffineTransform t = new AffineTransform( ctm.get(Matrix.I11), ctm.get(Matrix.I12), ctm.get(Matrix.I21), ctm.get(Matrix.I22), ctm.get(Matrix.I31), ctm.get(Matrix.I32) ); return t.createInverse(); } catch (NoninvertibleTransformException e) { return null; } } public List getHiddenTexts() { return hiddenTexts; } final int pageNr; final List hiddenTexts = new ArrayList(); final Field field; final List locationalResult; }

class Strategy : LocationTextExtractionStrategy { public Strategy(int pageNr) { PageNr = pageNr; FieldInfo field = typeof(LocationTextExtractionStrategy).GetField("locationalResult", BindingFlags.NonPublic | BindingFlags.Instance); locationalResult = (List) field.GetValue(this); } public override void EventOccurred(IEventData data, EventType type) { if (type == EventType.RENDER_IMAGE) { ImageRenderInfo renderInfo = (ImageRenderInfo)data; Matrix imageCtm = renderInfo.GetImageCtm(); AffineTransform inverseCtm = Inverse(imageCtm); List notCovered = new List(locationalResult.Count); foreach (TextChunk chunk in locationalResult) { Point checkPoint = GetCheckPoint(chunk.GetLocation()); Point pullback = inverseCtm.Transform(checkPoint, null); if (!IsInUnitSquare(pullback)) notCovered.Add(chunk); } if (notCovered.Count < locationalResult.Count) { locationalResult.RemoveAll(notCovered.Contains); String text = GetResultantText(); HiddenText hiddenText = new HiddenText(PageNr, imageCtm, text, renderInfo.GetImage()); HiddenTexts.Add(hiddenText); locationalResult.Clear(); // Or not? locationalResult.AddRange(notCovered); } } base.EventOccurred(data, type); } Point GetCheckPoint(ITextChunkLocation location) { Vector start = location.GetStartLocation(); Vector end = location.GetEndLocation(); return new Point((start.Get(Vector.I1) + end.Get(Vector.I1)) / 2, (start.Get(Vector.I2) + end.Get(Vector.I2)) / 2); } bool IsInUnitSquare(Point point) { double x = point.GetX(); double y = point.GetY(); return 0 <= x && x <= 1 && 0 <= y && y <= 1; } AffineTransform Inverse(Matrix ctm) { try { AffineTransform t = new AffineTransform( ctm.Get(Matrix.I11), ctm.Get(Matrix.I12), ctm.Get(Matrix.I21), ctm.Get(Matrix.I22), ctm.Get(Matrix.I31), ctm.Get(Matrix.I32) ); return t.CreateInverse(); } catch (NoninvertibleTransformException) { return null; } } public int PageNr { get; private set; } public List HiddenTexts { get; private set; } = new List(); List locationalResult; }

Using this heuristic, you would then be able to detect PDFs within your application or workflow that might merit some closer inspection.

Detecting malicious intent

It’s one thing to detect overlaid content in a PDF document, but it’s something else entirely to gauge malicious intent on the part of the document’s author or producer. This is harder, if not impossible, to do for an algorithm than it is for a human being. There are completely legitimate use cases for using overlays in PDF. One such case is for PDFs that were processed by OCR software. An OCR tool usually adds the recognized characters in a separate layer underneath the original, scanned image, enabling users to copy and paste text from the document.

With this blog post, we covered the implementation of a heuristic for detecting and flagging PDFs that have the potential to be misused by someone with malicious intent. We envision this heuristic as being part of a larger workflow for vetting incoming documents, also making note of the caveat surrounding the legitimate use of overlays in PDFs.

Bearing in mind that the researchers uncovered two additional methods of achieving a similar goal, in the next post we’ll tackle the implementation of heuristics for both the replace and hide-and-replace Shadow Attacks. Stay tuned!

[1] See <https://www.pdf-insecurity.org/signature/signature.html> and <https://itextpdf.com/en/blog/technical-notes/avoiding-pdf-digital-signature-vulnerabilities-itext> for details.

[2] Vulnerability Report: Attacks bypassing the signature validation in PDF (2020-03-02)

Discover Part 1

Discover Part 3