iText pdfSweep
pdfSweep is an iText Core add-on for Java and C# (.NET) that removes (redacts) information from a PDF document in a reliable and secure way
How it works
With just a few lines of code you can use the powerful PDF redaction capabilities of pdfSweep to irretrievably remove content. The following example will find and redact all instances of the word "Alice" in a document, regardless of casing:
1
2
3
4
try (PdfDocument pdf = new PdfDocument(new PdfReader(SRC), new PdfWriter(new_File(SRC, "redact")))) {
final ICleanupStrategy cleanupStrategy = new RegexBasedCleanupStrategy(Pattern.compile("Alice", Pattern.CASE_INSENSITIVE)).setRedactionColor(ColorConstants.PINK);
PdfCleaner.autoSweepCleanUp(pdf, cleanupStrategy);
}
1
2
3
4
PdfDocument pdf = new PdfDocument(new PdfReader(SRC), new PdfWriter(DEST));
ICleanupStrategy cleanupStrategy = new RegexBasedCleanupStrategy(new Regex(@"Alice", RegexOptions.IgnoreCase)).SetRedactionColor(ColorConstants.PINK);
PdfCleaner.AutoSweepCleanUp(pdf, cleanupStrategy);
pdf.Close();
The original PDF
The redacted PDF
Resources
Here you will find the needed resources to install and use pdfSweep.
Java
Other resources
Core capabilities of the iText pdfSweep redaction tool
pdfSweep intervenes as you edit a PDF document with iText Core's document stamping and watermarking tools. After adding a digital "blackout bar" over the sensitive text, image or part of an image, pdfSweep changes the document's rendering instructions causing the hidden content of your digital document to become impossible to extract. This works for both text and images, affording you full information security.
Looking at the advantages of pdfSweep and the data security it offers, you may find it surprising that it only takes five lines of code to integrate pdfSweep into your document workflow.
Automatic removal of words and phrases
Remove text from a document, based on patterns like regular expressions.
Customized removal areas
Offers you the ability to remove content as necessary, just like a digital black bar.
Secure and reliable removal
As well as the visual appearance that is rendered when viewing or printing the PDF document, pdfSweep also takes care of the underlying rendering instructions and data structures to ensure the removed information is not retrievable.
Partial removal of text and images
When content is partially covered by a redaction area, it is only partially removed, allowing you to remove selected parts of text and images.
Why use iText pdfSweep?
pdfSweep is a highly efficient PDF tool for confidential data redaction.
Remove content from your digital documents irretrievably instead of just covering it up. You can also redact text, images, parts of images or drawings for complete confidentiality. iText pdfSweep complies with GDPR for data redaction.
Flexible options
Use recurring data or data fields to automate redaction throughout any volume of documents, with a set of predefined patterns for common data such as social security numbers, account numbers, ID numbers etc... Define custom redaction areas using coordinates to redact any content within.