What are the extra characters in the font name of my PDF?

While extracting font name from PDF, I get some junk characters followed by plus sign and then the font name with font style. I want to remove the junk characters. I get those junk characters only for a few PDF file, for example: MMLPEO+RemingtonNoiseless


string curFont = renderInfo.GetFont().PostscriptFontName;
Posted on StackOverflow on May 16, 2013 by pdp

The "junk" characters indicate that the font isn't embedded completely. You'll find names such as ABC123+RemingtonNoiseless, XYZ456+RemingtonNoiseless, etc... meaning that there may be different subsets of the same font inside the PDF.

For an explanation have a look at section 9.6.4 Font Subsets of the PDF specification ISO 32000-1:2008:

For a font subset, the PostScript name of the font — the value of the font’s BaseFont entry and the font descriptor’s FontName entry — shall begin with a tag followed by a plus sign (+). The tag shall consist of exactly six uppercase letters; the choice of letters is arbitrary, but different subsets in the same PDF file shall have different tags.

EXAMPLE EOODIA+Poetica is the name of a subset of Poetica®, a Type 1 font.

In other words: these characters aren't merely "junk". If you want to remove them, that's a no-brainer, just use the appropriate string manipulation method, but be aware that removing them throws away information that may be useful in some contexts.

Ready to use iText?

Try our iText 7 Library and add-ons FREE for 30 days. Test your proof of concept, and see if our solution is right for you.

Get my FREE trial

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now