Does a PDF file have styles, headers and footers?

Does a PDF file have styles, headers and footers information as is the case with docx files that have separate xml files with extra information?

Posted on StackOverflow on Jan 21, 2014 by Prakhar

Regular PDFs don't have styles, but different fonts (for instance Helvetica is one font, Helvetica-Bold is another font of the same family). They don't have headers and footers, just like they don't have paragraphs, section titles, table rows or table cells. Everything you see in a PDF page, is just a bunch of glyphs, paths and shapes drawn on a canvas.

However: if your PDF is a Tagged PDF, the PDF contains something that is known as the StructTreeRoot. This means that, apart from the presentation of the content, you also have a tree structure that stores the semantics of the content. This structure contains references to the content on the different pages, allowing you (for instance) to find out which lines belong together in a paragraph, which parts of the page are "artifacts" (such as a repeating page header or a footer with a page number), which content is organized as a table, etc...

Tagged PDF is a requirement for PDF/A Level A and PDF/UA documents. A majority of the PDF files you can find in the wild aren't tagged (or aren't tagged properly).

Click this link if you want to see how to answer this question in iText 5.



Ready to use iText?

Try our iText 7 Library and add-ons FREE for 30 days. Test your proof of concept, and see if our solution is right for you.

Get my FREE trial
Contact

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now