iText 7 pdf2Data for PDF processing
pdf2Data is an iText 7 add-on for Java and C# (.NET) that allows you to easily extract data from PDF documents.
It offers a framework to recognize data inside PDF documents, based on selection rules that you define in a template.
How it works
Try the below example yourself with the online demo:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 // Make sure to load license file before invoking any code LicenseKey.loadLicenseFile(pathToLicenseFile); // Parse template into an object that will be used later on Template template = Pdf2DataExtractor.parseTemplateFromPDF(pathToPdfTemplate); // Create an instance of Pdf2DataExtractor for the parsed template Pdf2DataExtractor extractor = new Pdf2DataExtractor(template); // Feed file to be parsed against the template. Can be called multiple times for different files ParsingResult result = extractor.recognize(pathToFileToParse); // Save result to XML or explore the ParsingResult object to fetch information programmatically result.saveToXML(pathToOutXmlFile);
1 2 3 4 5 6 7 8 9 10 11 12 13 14 // Make sure to load license file before invoking any code LicenseKey.LoadLicenseFile(pathToLicenseFile); // Parse template into an object that will be used later on Template template = Pdf2DataExtractor.ParseTemplateFromPDF(pathToPdfTemplate); // Create an instance of Pdf2DataExtractor for the parsed template Pdf2DataExtractor extractor = new Pdf2DataExtractor(template); // Feed file to be parsed against the template. Can be called multiple times for different files ParsingResult result = extractor.Recognize(pathToFileToParse); // Save result to XML or explore the ParsingResult object to fetch information programmatically result.SaveToXML(pathToOutXmlFile);
First create a template PDF based on a sample document, by defining
selectors using areas of interest and selection rules. This can be done through the intuitive pdf2Data template editor, which is offered as a web application:
Here you will find the needed resources to install and use pdf2Data.
Why use iText 7 pdf2Data?
Data is an important commodity, and you may have more than you realize locked inside your PDF documents.
Of course, collecting this data manually would take you a lot of time, and increase the risk of input errors as well as security issues.
With pdf2Data you can automate the process of extracting data in a secure way.
Automate PDF data extraction from PDF invoices, forms and other documents
Extract and process data from small or large volumes of PDFs by defining the information that is important for your data processes in a template. Automate PDF data extraction with programming in Java and .NET (C#). pdf2Data for automated, highly efficient PDF data extraction & data processing.
Define which specific data you want to target for PDF data extraction
Easily define the desired information you want to extract in a template with the pdf2Data template editor. pdf2Data for PDF data extraction works with all PDF documents, such as invoices, forms, reports etc. and makes PDF data processing a highly efficient part of your workflow.
Integrate automated PDF data extraction into your existing document process
pdf2Data uses open standards to facilitate integration, which makes integrating it into existing workflows easy and fast. It includes SDKs for Java and .NET (C#) as well as a command line interface. PDF data processing for the 21st century.
Core capabilities of iText 7 pdf2Data
pdf2Data works by defining the areas, fonts, patterns, or tables of interest in a template that is used for all PDFs created in the same format, such as an invoice or other commercial documents.
You then can define areas of interest with selectors.
Each selector uses a different way of identifying the information that is important and can be used in conjunction or alone to meet your needs.
Extract data from PDF documents
Leverage iText 7 Core content extraction, for a high fidelity recognition process of text and images for PDF data processing.
Intuitive extraction configuration
This add-on has comprehensive out of the box functionality, with the flexibility to extend and customize. Focus on easy integration and open standards.
Use templates to streamline extraction
Define areas of interest and selection rules to get exactly the content you need.
Integrate in your PDF and/or data workflow
Data is output in a structured, reusable format for further processing, with access to the page coordinates of the extracted content.