How to do HTML to XML conversion to generate closed tags?

When I try converting html to pdf using iText and XML Worker, I'm asked to give the closing tag for and tags. It works if I do this manually, but I don't want to add each closing tag manually. Ho

When I try converting html to pdf using iText and XML Worker, I'm asked to give the closing tag for


and <br> tags. It works if I do this manually, but I don't want to add each closing tag manually. How can I do this in an automated way?

Posted on StackOverflow on Oct 30, 2014 by Kannu Verma

You are experiencing this problem because you are feeding HTML to iText's XML Worker. XML Worker requires XML, so you need to convert your HTML into XHTML.

There is an example on how to do this here: D00_XHTML

public static void tidyUp(String path) throws IOException {
    File html = new File(path);
    byte[] xhtml = Jsoup.parse(html, "US-ASCII").html().getBytes();
    File dir = new File("results/xml");
    dir.mkdirs();
    FileOutputStream fos = new FileOutputStream(new File(dir, html.getName()));
    fos.write(xhtml);
    fos.close();
}

In this example, we get a path to an ordinary HTML file (similar to what you have). We then use the Jsoup library to parse the HTML into an XHTML byte array. In this example, we use that byte array to write an XHTML file to disk. You can use the byte array directly as input for XML Worker.


Share this article

Ready to use iText?

Try our iText 7 Library and add-ons FREE for 30 days. Test your proof of concept, and see if our solution is right for you.

Get my FREE trial
Contact

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now