How to use a text extraction strategy after applying a location extraction strategy?

Would creating a new method or class called FontBasedTextExtractionStrategy instead of a simple TextExtractionStrategy help?

31st May 2016
admin-marketing

I used the following code to get data in PDF from a particular location.

 

Rectangle rect = new Rectangle(0,0,250,250);
RenderFilter filter = new RegiontextRenderFilter(rect);
fontBasedTextExtractionStrategy strategy = new fontBasedTextExtractionStrategy();
strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter); //Throws Error.
I want to get the bold text present in that location. Would creating a new method or class called FontBasedTextExtractionStrategy instead of a simple TextExtractionStrategy help?

 

Posted on StackOverflow on Jul 1, 2014 by Raka

Please take a look at the ParseCustom example for iText 7. In this example, we create a custom TextRegionEventFilter (not ITextExtractionStrategy):

class FontFilter extends TextRegionEventFilter {
    public FontFilter(Rectangle filterRect) {
        super(filterRect);
    }
    @Override
    public boolean accept(IEventData data, EventType type) {
        if (type.equals(EventType.RENDER_TEXT)) {
            TextRenderInfo renderInfo = (TextRenderInfo) data;

            PdfFont font = renderInfo.getFont();
            if (null != font) {
                String fontName = font.getFontProgram().getFontNames().getFontName();
                return fontName.endsWith("Bold") || fontName.endsWith("Oblique");
            }
        }
        return false;
    }
}

This text will filter all text so that only text of which the Postscript font name ends with Bold or Oblique.

This is how you use this filter:

public void parse(String src) throws IOException {
    PdfDocument pdfDoc = new PdfDocument(new PdfReader(src));
    Rectangle rect = new Rectangle(36, 750, 523, 56);
    FontFilter fontFilter = new FontFilter(rect);
    FilteredEventListener listener = new FilteredEventListener();
    LocationTextExtractionStrategy extractionStrategy = listener.attachEventListener(new LocationTextExtractionStrategy(), fontFilter);
    new PdfCanvasProcessor(listener).processPageContent(pdfDoc.getPage(i));
    String actualText = extractionStrategy.getResultantText();
    System.out.println(actualText);
    pdfDoc.close();
}

As you can see, we create a LocationTextExtractionStrategy that takes our self-made filter based on the font. To extract text we use processPageContent().

Click this link if you want to see how to answer this question in iText 5.


Share this article

Contact

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now