iText launches iText pdfOCR, a powerful open source product enabling text recognition in scanned documents and conversion into editable PDFs

Tue - 06/30/2020

A powerful open source product enabling text recognition in scanned documents and conversion into editable PDFs.

Share this article

pdfOCR

Ghent, Belgium – June 30, 2020. iText Group NV, a globally recognized thought-leader and innovator in PDF libraries and solutions, today announced the launch of iText pdfOCR, the newest addition to their award-winning software offering.

iText pdfOCR, which is part of the renowned iText 7 PDF SDK, offers Optical Character Recognition (OCR) functionality to convert printed text in scanned documents and images into a fully searchable PDF/A-3u compliant format (PDF version 1.7) and make accessing those texts easier and faster. Without machine-readable text, printed or scanned documents cannot be searched, indexed or interpreted. Logical follow-up actions could be data extraction with iText pdf2Data, secure content redaction with iText pdfSweep, or multilanguage document recreation with iText pdfCalligraph. With repurposing data with the low-code document generator iText DITO® often being the final cherry on the cake.

The iText pdfOCR add-on is built on the Tesseract OCR engine technology. Tesseract supports over 100 languages and was originally developed by Hewlett-Packard (‘85), and was released under the Apache open source license in 2005. Since 2006, its development has been sponsored by Google.

"With COVID-19 urging companies to accelerate their digital transformation projects, organizations are forced to explore new ways of accessing and managing their data – existing and new. Being a leader in the digital documents space, we’re pleased to be at the forefront of this new era. As such, I am very proud to announce the latest addition to our PDF library for today’s new world: thanks to the OCR capabilities of iText pdfOCR many new opportunities will open up for users and enterprises that want to maximize their data potential." Yeonsu Rosa Kim, CEO at iText Group NV, stated.

"Staying true to our open-source roots, we’ve decided to build iText pdfOCR upon the open-source Tesseract OCR Engine. With this, we wish to reconfirm our positioning as an open-source company - a value which is appreciated by our millions of users and clients." Yeonsu Kim added.

“With this new addition to our PDF library, developers will now be able to leverage data locked away in documents which until now weren’t accessible. Our latest product enables them to enlarge their digital workflow capabilities by accessing the data buried in scanned files and deploy it for any action or purpose they or their end-user would like.” Tony Van den Zegel, VP of Products & Marketing at iText Group NV and General Manager at iText Software Belgium, said.

The applications of iText pdfOCR are various: for instance, archiving of historical documents, translations of legal documents, automatic data entry while processing all sorts of physical applications or claims, and sorting of otherwise not editable printed or scanned documents.

Please tune in for live demos on 9 July 2020. More information on on the pdfOCR webinar page.

About iText

iText is a global leader in innovative PDF software. Its award-winning products are used by millions of users, both open source and commercial. The diverse customer base includes many of the Fortune 500 companies - ranging from technology, financial, travel to healthcare companies, as well as small companies and government agencies. Headquartered in Belgium, iText also has offices in Asia (Singapore and South-Korea) and in the USA (Boston).

www.itextpdf.com

Register for the LIVE pdfOCR webinar on 9 July 2020

Introducing iText pdfOCR:

Enabling text recognition in scanned documents, PDFs and images.

Share this article

Popular tags

Digital signatures

iText 7

iText

iText Suite release

PDF 2.0

HTML to PDF

Fonts

CSS

Comparing the iText 5 Chunk and iText 8 Text Classes

Wed - 10/11/2023

Though it was superseded in 2016 by iText 7, iText 5 is still an extremely popular PDF library. While its ease of use and features are still competitive, this article looks at its high-level text and content capabilities to compare them against the revised API in iText 7 and 8. You'll learn why iText 8 is a far superior option, especially for new implementations.

Read full article

iText 7 Suite: version 7.1.14 released

Wed - 01/13/2021

In our first release for 2021, a number of new features and improvements are included, such as extended SVG support for Core and pdfHTML, more PDF form border styles and revised OpenType Font processing.

Read full article

iText Suite 7.1.13 released

Mon - 10/26/2020

In our final release for 2020, we’ve a number of new features and improvements, including improved Core word wrapping for special scripts, full support for CSS background properties in pdfHTML, and much more!

Read full article

Ready to use iText?

As always, if you have any technical questions, you can contact support with your valid support subscription or head over to one of our community support pages on Stack Overflow to see if your question has already been answered for our open source AGPL users.

Get started with iText Refer to API Documentation

Contact

Still have questions?

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now

iText launches iText pdfOCR, a powerful open source product enabling text recognition in scanned documents and conversion into editable PDFs

Share this article

Register for the LIVE pdfOCR webinar on 9 July 2020

Expand sidebar

Related content

Comparing the iText 5 Chunk and iText 8 Text Classes

iText 7 Suite: version 7.1.14 released

iText Suite 7.1.13 released

Ready to use iText?

Still have questions?