Why can't I extract text added using a Type3 font correctly from a PDF?

I have PDF file in Arabic that has text with font Type3 when I extract text using PDFBox some characters are empty and their font equals null? I want to know what is the problem. protected void p

I have PDF file in Arabic that has text with font Type3 when I extract text using PDFBox some characters are empty and their font equals null? I want to know what is the problem.

protected void processTextPosition(TextPosition text) {
    String character=text.getCharacter(); // is empty
    String font=text.getFont().getBaseFont(); // equal null
}
The stream produced with iText looks like this: ( dJ� v{d W�cG�)Tj

Why do I get the characters in this format?

Question marks appear in my stream as "SOH-STX-ETX-EOT", not as one character. The character inside the PDF is shown as 'd' and 'J'!

Posted on StackOverflow on Feb 9, 2014 by Ayman Younis

A Type 3 font is a user-defined font. For instance: a user can define that the character 'P' corresponds with the symbol for "The Artist Formerly Known As Prince" which is a glyph, but not a letter from any known alphabet:

The TAFKAP symbol
The TAFKAP symbol

A glyph in a Type 3 font is a series of lines and shapes, and there's no way for a program such as iText or PDFBox to determine which character was meant. It is only normal that you get a question mark.

One of the following reasons applies for a PDF that contains Type 3 fonts:

  1. The font was used to introduce symbols that don't exist in any font.
  2. The font was used to obfuscate the content of the PDF so that its content can't be extracted.
  3. The PDF wasn't created in an elegant way.

If the Type 3 font was used for normal characters, you'll need to use OCR to convert the content to normal text.

Click this link if you want to see how to answer this question in iText 5.



Ready to use iText?

Try our iText 7 Library and add-ons FREE for 30 days. Test your proof of concept, and see if our solution is right for you.

Get my FREE trial
Contact

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now