Not so long ago, the only options for businesses and public authorities to store documents for the long term in a reproducible format were physical media such as paper, microfilms, and microfiches. However, the advent of the digital age meant new solutions were needed. Those who first needed to store documents in a future-proof digital format used the popular image format TIFF (Tagged Image File Format) since this was the standard format for fax machines (remember those?) and scanners. TIFF is still used in many legacy archiving systems, but has its limitations:
- The TIFF raster format contains no text-based information, meaning files cannot be searched by their text content.
- TIFFs containing color images or pages will become significantly larger, and efficiently compressing anything except black-and-white line drawings is difficult.
- Contrary to popular belief, TIFF is not an ISO standard. Things like resolution, color and metadata settings are mostly left to the individual user’s discretion.
The introduction of PDF in 1993 saw the format quickly gain popularity, and users and developers began to recognize its potential for long-term archiving. There was room for improvement though, so in 2002, work began to develop a purpose-built file format for standardized archiving. This initiative involved specialists from libraries and archives, from administrative bodies, from industry and from the judicial system.
A working group within the ISO (International Organization for Standardization) was set up and ISO published PDF/A-1 on the 1st of October 2005 as the world’s first standard file format for digital long-term archiving. Since then, three further parts of the standard have been released: PDF/A-2 (in 2011), PDF/A-3 (in 2012) and PDF/A-4 was published in late 2020.
Why do we need PDF/A?
While the PDF format does not guarantee long-term legibility or complete independence from software, PDF/A ensures that the PDF document can still be read without problems decades later. It is a subset of the PDF standard, meaning functionalities that are not useful for PDF archiving have been removed. In addition, it forbids certain things which could hinder long-term archiving and demands certain requirements which guarantee reliable reproduction of the file. PDF/A requires that files be self-describing, and all information necessary to read the document (such as the specific fonts used) are embedded directly in the file.
These features are essential for document archives, which require that content must always appear exactly the same under all circumstances. Thus, conforming viewing applications must ensure to faithfully display PDF/A documents exactly as they were intended to be seen.
PDF/A is a widely-accepted standard for digital preservation and archiving
The PDF/A standard is a vital component of many institutional repository file format policies. The formats chosen for these policies are typically open, non-proprietary, and widely available for long-term archival use. Factors for a format’s inclusion in such policies include a format’s longevity and maturity, its adaptation in relevant professional communities, incorporation of information standards, and the long-term accessibility of any required viewing software.
Renowned archive institutions like The Smithsonian list PDF/A as a recommended format because of its widely-documented acceptance by the archival and digital preservation communities. Other prominent institutions such as the National Archives of The Netherlands, The National Archives of the UK, Stanford University, and The Digital Preservation Coalition also indicate PDF/A as a preferred archiving format.
The lion’s share of data creators, data managers and digital archivists around the globe will tell you PDF/A is their preferred file format for archiving, whether for legal compliance, auditing, or research reasons. Any organization or business which is looking into preserving electronic records should realize from these testimonials how important it is to make their records compliant with the PDF/A specification.
If you would like a more detailed rundown of PDF/A versions, their various conformance levels, and the multitude of potential PDF/A use cases, we recommend the free ebook "PDF/A: digital documents to withstand the sands of time".
How do I create PDF/A files?
Many modern PDF and office applications will allow you to export documents as PDF/A. If you need high-volume document processing capabilities though, iText 7 Core can help your organization create, manipulate, and process documents into PDF/A compliant files at scale. It also allows you to build PDF/A capabilities directly into your own applications.
If you require more specific PDF functionalities to supplement your data preservation activities, have a look at the broad range of iText 7 add-ons which enable optical character recognition, secure redaction, XFA to PDF, intelligent data extraction, advanced typography features, PDF compression, rendering PDF to images, PDF creation from HTML, and converting MS Office documents to PDF.
7 tips to safeguard the future of your electronic records:
1. Migration: Converting files to preservation standards formats like PDF/A will increase their longevity.
2. File names: Use logical, descriptive, and consistent file names dated in year-month-day format. Avoid spaces, periods, and special characters.
3. Version Control: Add draft and revision numbers to the file name.
4. Organization: Create logical and hierarchical folder structures. For example, group by category or date.
5. Copies: Follow the LOCKSS philosophy and save >3 copies on multiple devices or servers in multiple locations.
6. Review and Refresh: Different storage media have different life spans. Check your files annually and replace media or migrate if necessary.
7. Integrity: Digital signatures can ensure files cannot be altered without invalidating the integrity of the record. Digital signatures can provide a range of valuable capabilities, from tamper protection to authentication and revocation.
Do you want to truly understand PDF/A?
iText just released a brand new ebook on PDF/A. Learn more about the standard’s details, popular use cases, areas of application over various industries and an archiving case study featuring medical imaging specialist Zeiss. We’ll even take it one step further and provide you a tutorial on creating PDF/A with the iText 7 Library. To celebrate the release we'll guide you through the depths of PDF/A in a live webinar on October 28, 2021.