pdf2Data

Intro

iText 7 pdf2Data for PDF processing

pdf2Data is an iText 7 add-on for Java and C# (.NET) that allows you to easily extract data from PDF documents.

It offers a framework to recognize data inside PDF documents, based on selection rules that you define in a template.

How it works

Try the below example yourself with the online demo:

Data extraction

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Make sure to load license file before invoking any code
LicenseKey.loadLicenseFile(pathToLicenseFile);
 
// Parse template into an object that will be used later on
Template template = Pdf2DataExtractor.parseTemplateFromPDF(pathToPdfTemplate);
 
// Create an instance of Pdf2DataExtractor for the parsed template
Pdf2DataExtractor extractor = new Pdf2DataExtractor(template);
 
// Feed file to be parsed against the template. Can be called multiple times for different files
ParsingResult result = extractor.recognize(pathToFileToParse);
 
// Save result to XML or explore the ParsingResult object to fetch information programmatically
result.saveToXML(pathToOutXmlFile);
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Make sure to load license file before invoking any code
LicenseKey.LoadLicenseFile(pathToLicenseFile);
 
// Parse template into an object that will be used later on
Template template = Pdf2DataExtractor.ParseTemplateFromPDF(pathToPdfTemplate);
 
// Create an instance of Pdf2DataExtractor for the parsed template
Pdf2DataExtractor extractor = new Pdf2DataExtractor(template);
 
// Feed file to be parsed against the template. Can be called multiple times for different files
ParsingResult result = extractor.Recognize(pathToFileToParse);
 
// Save result to XML or explore the ParsingResult object to fetch information programmatically
result.SaveToXML(pathToOutXmlFile);

Template creation

First create a template PDF based on a sample document, by defining selectors using areas of interest and selection rules. This can be done through the intuitive pdf2Data template editor, which is offered as a web application:

pdf2Data Example

Benefits

Why use iText 7 pdf2Data?

Data is an important commodity, and you may have more than you realize locked inside your PDF documents.

Of course, collecting this data manually would take you a lot of time, and increase the risk of input errors as well as security issues.

With pdf2Data you can automate the process of extracting data in a secure way.

 

pdf2Data icon svg
Automate PDF data extraction from PDF invoices, forms and other documents

Extract and process data from small or large volumes of PDFs by defining the information that is important for your data processes in a template. Automate PDF data extraction with programming in Java and .NET (C#). pdf2Data for automated, highly efficient PDF data extraction & data processing.

pdf2Data icon svg
Define which specific data you want to target for PDF data extraction

Easily define the desired information you want to extract in a template with the pdf2Data template editor. pdf2Data for PDF data extraction works with all PDF documents, such as invoices, forms, reports etc. and makes PDF data processing a highly efficient part of your workflow.

pdf2Data icon svg
Integrate automated PDF data extraction into your existing document process

pdf2Data uses open standards to facilitate integration, which makes integrating it into existing workflows easy and fast. It includes SDKs for Java and .NET (C#) as well as a command line interface. PDF data processing for the 21st century.

Key features

Core capabilities of iText 7 pdf2Data

pdf2Data works by defining the areas, fonts, patterns, or tables of interest in a template that is used for all PDFs created in the same format, such as an invoice or other commercial documents.

You then can define areas of interest with selectors.

Each selector uses a different way of identifying the information that is important and can be used in conjunction or alone to meet your needs. 

Core capabilities development icon
Extract data from PDF documents

Leverage iText 7 Core content extraction, for a high fidelity recognition process of text and images for PDF data processing.

Core capabilities development icon
Intuitive extraction configuration

This add-on has comprehensive out of the box functionality, with the flexibility to extend and customize. Focus on easy integration and open standards.

Core capabilities development icon
Use templates to streamline extraction

Define areas of interest and selection rules to get exactly the content you need.

Core capabilities development icon
Integrate in your PDF and/or data workflow

Data is output in a structured, reusable format for further processing, with access to the page coordinates of the extracted content.

Contact

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now