Can I generate a PDF from a URL instead of from a file on disk?

This question was asked on Stack Overflow on Aug 14, '17 by Srinivas Ch

You can generate a PDF from any HTML InputStream. In most of the examples, we have used a FileOutputStream, but in chapter 4, we have created reports that existed only in memory as a byte[]. In that case, we used a ByteArrayInputStream. We can also use an InputStream that was created from a URL object.

Suppose that we use this URL:

public static final String ADDRESS = "https://stackoverflow.com/help/on-topic";

If we open this URL in a browser, we see the following page:

An IMDB page in the browser

An IMDB page in the browser

In the C07E04_CreateFromURL example, we use ADDRESS to create a Java URL object:

new C07E04_CreateFromURL().createPdf(new URL(ADDRESS), DEST);

We use the following createPdf() method:

public void createPdf(URL url, String dest) throws IOException {
    HtmlConverter.convertToPdf(url.openStream(), new FileOutputStream(dest));
}

The openStream() method gives us an InputStream that will be used by iText to get the HTML - obviously, this only works on a machine that has access to the internet.

For pages with lots of pictures, it can take a while for iText to download all the resources, but this FAQ page from the Stack Overflow should load quickly, and the result will look like this:

The IMDB page rendered to A4 pages in PDF

The IMDB page rendered to A4 pages in PDF

Maybe an A4 page isn't the ideal page size for a web page, because the complete sidebar is missing. Let's adapt the example, and introduce a media query.

The createPdf() method of the C07E05_CreateFromURL2.java example looks like this:

public void createPdf(URL url, String dest) throws IOException {
    PdfWriter writer = new PdfWriter(dest);
    PdfDocument pdf = new PdfDocument(writer);
    PageSize pageSize = new PageSize(850, 1700);
    pdf.setDefaultPageSize(pageSize);
    ConverterProperties properties = new ConverterProperties();
    MediaDeviceDescription mediaDeviceDescription =
        new MediaDeviceDescription(MediaType.SCREEN);
    mediaDeviceDescription.setWidth(pageSize.getWidth());
    properties.setMediaDeviceDescription(mediaDeviceDescription);
    HtmlConverter.convertToPdf(url.openStream(), pdf, properties);
}

We use a custom page size of 850 by 1700 user units, and we use the Screen media type as done in chapter 2. Now the content fits the page, and we get a much better result:

The IMDB page rendered to custom-sized pages in PDF

The IMDB page rendered to custom-sized pages in PDF

Sure, there are still some imperfections. For instance: the items in the header bar are shown as a list, instead of as items in a menu bar, but we plan to solve these issues in future versions of pdfHTML.

We could also have used the media type PRINT instead of SCREEN. See the C07E06_CreateFromURL3 example:

public void createPdf(URL url, String dest) throws IOException {
    ConverterProperties properties = new ConverterProperties();
    MediaDeviceDescription mediaDeviceDescription =
        new MediaDeviceDescription(MediaType.PRINT);
    properties.setMediaDeviceDescription(mediaDeviceDescription);
    HtmlConverter.convertToPdf(url.openStream(), new FileOutputStream(dest), properties);
}

Because of the print.css used by Stack Overflow, we now have a couple of bare bones pages in which the sidebar is omitted deliberately. Maybe that's exactly what we want:

The IMDB page rendered to A4 pages in PDF

The IMDB page rendered to A4 pages in PDF

Important: pdfHTML is a work in progress. If you have tried printing a web page to paper pages from a browser, you notice that the results aren't always quite as good as you'd want them to be. The same will be true when using pdfHTML as a URL2PDF tool. Most HTML pages aren't meant to be printed, but with pdfHTML, we're doing a continuous effort to improve the conversion process.



Ready to use iText?

Try our iText 7 Library and add-ons FREE for 30 days. Test your proof of concept, and see if our solution is right for you.

Get my FREE trial
Contact

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now