While extracting font name from PDF, I get some junk characters followed by plus sign and then the font name with font style.
I want to remove the junk characters. I get those junk characters only for a few PDF file, for example:
string curFont = renderInfo.GetFont().PostscriptFontName;
The "junk" characters indicate that the font isn't embedded completely. You'll find names such as ABC123+RemingtonNoiseless, XYZ456+RemingtonNoiseless, etc... meaning that there may be different subsets of the same font inside the PDF.
For an explanation have a look at section 9.6.4 Font Subsets of the PDF specification ISO 32000-1:2008:
For a font subset, the PostScript name of the font â€” the value of the fontâ€™s BaseFont entry and the font descriptorâ€™s FontName entry â€” shall begin with a tag followed by a plus sign (+). The tag shall consist of exactly six uppercase letters; the choice of letters is arbitrary, but different subsets in the same PDF file shall have different tags.
EXAMPLE EOODIA+Poetica is the name of a subset of PoeticaÂ®, a Type 1 font.
In other words: these characters aren't merely "junk". If you want to remove them, that's a no-brainer, just use the appropriate string manipulation method, but be aware that removing them throws away information that may be useful in some contexts.
In iText 7 to get the PostScript font name youâ€™ll need:
Where font is a
Click this link if you want to see how to answer this question in iText 5.