White paper

pdfSweep: what it does and why you need it

pdfSweep is an iText add-on that removes (redacts) sensitive information from a PDF (Portable Document Format) document. Confidentiality is assured, because the redacted information cannot be recovered. In a secure two-step process, pdfSweep deletes text and images at user-defined coordinates, or as defined by a regular expression. After having parsed the rendering information in the original PDF document, a new document is created without the redacted content.

Intro

Today, it is easy to distribute documents among large numbers of people with diverse needs. You can also grant or restrict access the content of these documents to keep information in the right hands. Both company policies and international regulations may require redaction of sensitive information before archiving or sharing documents with all users. Companies and organizations need tools to implement this redaction process in a flexible and automated way, with absolute guarantees regarding the elimination of the targeted elements. pdfSweep can help.

What is pdfSweep?

pdfSweep is an iText add-on that removes (redacts) sensitive information from a PDF (Portable Document Format) document. Confidentiality is assured, because the redacted information cannot be recovered. In a secure two-step process, pdfSweep deletes text and images at user-defined coordinates, or as defined by a regular expression. After having parsed the rendering information in the original PDF document, a new document is created without the redacted content.

Why do we need redaction?

Redaction can be useful whenever the publisher or author of a document wishes to take out certain information. Common use cases include:

  • Freedom of Information Act (USA) and similar legislation in other countries
  • Governmental declassification procedures
  • Proprietary information
  • Trade secrets
  • General Data Protection Regulation (Europe)
  • All data that would impact the privacy of people
    • Social security numbers (USA)
    • National register identification numbers
    • Phone numbers
    • Bank account details
    • Names of people in a clinical trial

A short history of redaction

In the past, ‘redaction’ simply involved printing a document, blacking out the necessary information and making a photocopy of the document. That way, all information covered by dark ink simply does not get copied. This worked because paper is a simple WYSIWYG format. There is no hidden data, no metadata that needs to be erased.

Continue reading the pdfDebug white paper

We hope you enjoyed this first page of the white paper, continue reading the full white paper.

Download View all white papers

White paper file
pdfSweep White Paper

Contact

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now