PDF/UA-2 is Here! Introducing the New Standard for PDF Universal Accessibility

Tue - 09/10/2024

The brand-new PDF/UA-2 ISO standard was published just a few months ago, and of course the creation of PDF/UA-2 documents is already supported by iText Core. In this article we look at the history of PDF accessibility, and what changes are in the new standard.

Share this article

If you keep up to date with PDF or accessibility news, you probably saw the announcement of a new ISO standard for universally accessible PDF documents. ISO 14289-2 (PDF/UA-2) was published on March 15th, 2024, and builds upon the foundation laid by PDF/UA-1, enhancing the accessibility of PDF documents – not just for those with disabilities, but to a broader range of users and applications.

As you may have noticed, support for PDF/UA-2 was included with the release of version 8.0.3 of iText. To learn more about the new standard and how we got here, read on.

The Road to Accessible PDF

The original PDF specification from 1993 was developed to represent documents in a reliable and device-independent manner, representing the content of a page as an image. However, this meant there was no way for users with disabilities to access the content unless they could see it.

Time does not stand still, however, and neither did the PDF specification. The first steps towards universally accessible PDFs came in PDF 1.3, with the introduction of an underlying document structure. Version 1.4 in 2001 then introduced the concept of “Tagged PDF”, which enabled assistive technologies such as screen readers to present PDF content to visually impaired users. It also enabled text reflowing, allowing users to increase text size without needing to scroll documents horizontally.

Why Tagged PDF is Important

Tagged PDF is incredibly useful for creating well-structured documents which are accessible to all. Since it provides a way to identify headings, paragraphs, alternative text for images and much more, it makes accurate data extraction and content reuse much easier for SDKs like iText, and other Apryse solutions like the Apryse IDP (Intelligent Data Extraction) module.

For an illustrative example, let’s look at a very simple untagged PDF consisting of just two lines of text. Then we’ll add some simple tags to classify the text as a heading and a paragraph. If we compare the two in a viewer such as Acrobat, we can see the tags we added in the second version:

A comparison of an untagged and a tagged PDF in Acrobat. <H1> and <P> tags have been added in the second example.

We can then take a closer look at our comparison documents in iText’s PDF debugging tool RUPS, to see the differences in the PDF structure:

Viewing the difference in PDF structure between the untagged and tagged documents in RUPS.
Viewing the difference in PDF structure between the untagged and tagged documents in RUPS.

The PDF Association has a comprehensive list of questions and answers about Tagged PDF, and how it forms the basis for reuse and accessibility in PDF. It’s an invaluable resource if you want to learn more about the subject. However, for now let us focus on how the PDF/UA family of standards came to be.

PDF/UA-1: The First ISO Standard for Universally Accessible PDF

With the publication of the Web Content Accessibility Guidelines (WCAG) generic accessibility norms were established for web content, including PDF. Subsequently, PDF/UA-1 was developed as a complementary standard to WCAG 2.0 to ensure specific PDF universal accessibility requirements. Based on the PDF 1.7 specification, it was first published in 2012 and refined in 2014 as ISO 14289-1. PDF/UA-1 provided definitive terms and requirements for accessibility in PDF documents and applications, and as such, it has been widely adopted by institutions and government agencies around the world.

In the US for example, Section 508 of the Rehabilitation Act mandates the accessibility of electronic and information technology used by the federal government to people with disabilities, and specifies PDF/UA-1 as a requirement for PDF authoring tools. In addition, the Library of Congress lists PDF/UA-1 as a preferred digital format that is also suitable for long-term preservation – since unencrypted PDF/UA documents can also conform to the PDF/A archival standard.

Note that since PDF/UA and WCAG complement each other, the Americans with Disabilities Act (ADA), Accessible Canada Act (ACA), the European Accessibility Act (EAA), and other regulations which refer to WCAG as requirements are compatible with PDF/UA-compliant documents for PDF content. So, making sure you have PDF/UA compliance is a safe bet.

Over on the Apryse site we wrote a recent article on the many advantages of accessible PDFs, and how software like the Apryse WebViewer can benefit from such well-structured and tagged PDFs. In addition to fostering inclusivity to those with disabilities, accessible PDFs boost search engine optimization (SEO), enable easier data extraction and reuse, and much more.

What Does PDF/UA-2 Bring to the Table?

PDF/UA-2 can take advantage of the numerous enhancements to Tagged PDF, particularly in PDF 2.0 and subsequent additions. While the ISO 14289-2 specification must be purchased from ISO, both the WTPDF and PDF 2.0 specifications are made available at no cost – the latter thanks to a collaboration between the PDF Association, Apryse, and other leading PDF companies.

As active participants in the PDF Association, its PDF Working Groups, and the ISO committees for PDF, Apryse has made significant contributions towards the development of PDF/UA-2 and related PDF standards in addition to our sponsorship of ISO 32000-2, which enables all PDF software developers to easily leverage the benefits of PDF 2.0, free of charge.

It’s important to note that ISO 14289-2 (PDF/UA-2) does not replace ISO 14289-1 (PDF/UA-1). PDF/UA-2 is the standard for accessible PDF files written against ISO 32000-2 (PDF 2.0), while PDF/UA-1 remains the ISO standard for accessible PDF files written against ISO 32000-1 (PDF 1.7). While PDF/UA-2 introduces new features and requirements, the laws and regulations which specify compliance with the PDF/UA-1 standard mean it will stay relevant for years to come.

In short, PDF/UA-1 is still a perfectly cromulent standard for creating accessible PDF documents; whether you’re generating accessible forms using a PDF SDK like iText, or automated data-driven documents and reports using a template solution like Fluent.

With that said, here are the major improvements of PDF/UA-2, with asterisks showing those which take advantage of features introduced in PDF 2.0:

  • Comprehensive requirements for structure element attributes and examples of semantically-significant attribute usage
  • Comprehensive requirements for annotations
  • Rules governing the relationship between structure elements defined in PDF 1.7 and those defined in PDF 2.0
  • Support for structure elements defined in PDF 2.0, including Title, DocumentFragment, Aside, FENote, Artifact and more*
  • Support for MathML
  • Requirements regarding the use of structure destinations with intra-document links*
  • The use of PDF 2.0’s Associated Files feature to facilitate the integration of non-PDF content*
  • Support for modern Unicode*

PDF/UA-2 Support in iText Core

If you want to be able to create or manipulate PDF/UA-2 compliant documents, great news! We added support for PDF/UA-2 creation and validation in our February release of iText Core – Apryse’s leading open-source PDF library. You can find PDF/UA-2 code examples in our Java and .NET Github sample repositories.

However, the PDF/UA-2 standard is also based on the PDF Association’s new Well Tagged PDF (WTPDF) specification. Since there is a large overlap in the requirements for accessibility and reuse, this specification aims to help software and document authors target these uses cases by defining appropriate conformance levels for PDF 2.0 documents.

As the PDF Association states, "If you support WTPDF you also support PDF/UA-2” since the two standards are fully compatible. You can therefore rest assured that creating files conforming to the accessibility conformance level defined in WTPDF will also meet the new “gold standard” for accessible PDF.

As a demonstration, you can find a link to an example document showing iText’s conformance with the specification on the WTPDF page on the PDF Association site, or our public repository on GitHub.

Simplifying PDF/UA-2 Creation

As noted earlier, creating well-structured tagged PDFs has benefits, even if you don’t specifically need to achieve compliance with archiving or accessibility regulations. A convenient way to create such documents with iText is by using HTML or XML templates and converting them to PDF using the pdfHTML add-on for iText Core. By doing this, you can utilize the semantic and structural information of the HTML/CSS, and pdfHTML will map them to the high-level objects and styles used by the iText Core layout and rendering engine.

Part of iText’s layout module, this engine is a fundamental and powerful component of the iText Core library. Designed to transform abstract high-level elements such as Paragraphs, Tables, and Lists into the basic syntax of PDF, it enables developers to define content in familiar HTML-like structures without needing to worry about the PDF format. The module's versatility allows for the creation of complex PDF documents, while also managing the intricate details of PDF generation, such as text rendering and page breaks. For a more comprehensive overview, you can refer to the documentation in our Java and .NET GitHub repositories.

You can find information on creating Tagged PDF in Chapter 1 of our pdfHTML tutorial, while Chapters 3 and 4 go into more detail on creation of archivable and accessible PDFs. Naturally, we utilized pdfHTML’s capabilities to create our WTPDF document linked above, and you can check out the code example on the iText Knowledge Base to learn more.

A Step Forward for Universal Accessibility

As iText is a leading open-source PDF SDK, we believe in the importance of driving support for new standards in the open-source community. Our early adoption of new standards is well-established; with iText also featuring early support for PDF/UA-1 back in 2013, while PDF 2.0 has been supported since 2017. Recent releases introduced support for the recent PDF 2.0 digital signature extensions, and the latest PDF/A-4 standard for archiving.

We’ve also made general improvements to iText’s PDF/A and PDF/UA support, with extra checks and helper logic being added in version 8.0.3, and additional APIs for more user-friendly creation of accessible PDFs.

Before we wrap up this article, we think it's important to recognize the efforts of ISO, the PDF Association, and the PDF Reuse and PDF/UA Technical Working Groups who contributed to the development of PDF/UA-2. This new standard is a testament to the collaborative effort of these experts in accessibility, technology, and standards. It reflects a shared vision of a world where the content is not just available, but accessible to all.

In conclusion, PDF/UA-2 is more than just a new standard; it's a step forward in the ongoing mission to make the digital world accessible to everyone. It's a reminder that as technology progresses, so must our commitment to inclusivity and accessibility. PDF/UA-2 moves us closer to a future where content is universally accessible, and the barriers to information are a thing of the past.



Contact

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now