Why is the text I extract from an English PDF page garbled?

I'm trying to extract and print English text out of a PDF on the console. Extraction is done through iText's PdfTextExtractor class. The text I'm getting is not understandable. The following code snippet represents my string extractor:


Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document,
    new FileOutputStream(OUTPUTFILE));
PdfReader reader = new PdfReader(input);
int n = reader.getNumberOfPages();
PdfImportedPage page;
// Go through all pages
for (int i = 1; i  n; i++) {
    String str=PdfTextExtractor.getTextFromPage(reader, i);

The output I'm getting on console is not understandable even though the text in the PDF is in English:

t cotenn dna o mntoafinir yales r ni et h layhcsip Amgteu end y Retila m eysts e erefcern emsyst o f et h se. ru I n tioi, dnda etseh orpvedi eddda e ulav o se vdcie ollaw na s tiouquibu cacess o t latoutenxc e rpap dna t ilagid otten tofoi. nmirna ni soitaoli n mor f chea e. roth s iTh s i a cel ra csea ewerh " eth lweoh is ermo nath eth ms u fo sti rtasp ".

Can anybody please help me out what could be the possible solution for bringing text in English language as it is like in source PDF.

Posted on StackOverflow on May 16, 2014 by codechefvaibhavkashyap

If you want the text to be ordered based on its position on the page, you need to introduce a specific strategy, such as the LocationTextExtractionStrategy:

for (int i = 1; i 

The LocationTextExtractionStrategy sometimes results in odd sentences, more specifically if the letters 'dance' on the page (the baseline of the glyphs differs for text on the same line). In that case, you can try the SimpleTextExtractionStrategy which will return the text in the order in which it appears in the PDF syntax content stream.

Ready to use iText?

Try our iText 7 Library and add-ons FREE for 30 days. Test your proof of concept, and see if our solution is right for you.

Get my FREE trial

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now