iText's software libraries are quite technical and the world of PDF comes with its own professional jargon. If you're feeling mystified by some of the PDF vocabulary, you've come to the right place to learn more.

PDF jargon

Document: a document is a piece of written, printed, or electronic matter that provides information or evidence or that serves as an official record.

Portable Document Format: the Portable Document Format (PDF) is a file format for capturing and sending electronic documents described in a series of standards, such as:

  • ISO 32000-2: the most recent core PDF specification,

  • ISO 19005-2 and 19005-3: the most recent standards for the long-term preservation of PDF documents, aka PDF/A,

  • ISO 14289-1: the standard for universal accessibility in the context of PDF, aka PDF/UA,

  • ETSI TS 102 778: a series of standards for PDF Advanced Electronic Signatures (PAdES)

  • ZUGFeRD: a standard for invoices that combines PDF/A-3 and the UN/CEFACT Cross Industry Invoice (CII) standard.

PDF ID: every PDF can be identified by an ID that consists of a pair of identifiers. The first identifier is created at the time a new PDF file is created, and it's permanent in the sense that it won't change when the PDF is updated. The second identifier is initially identical to the first part, but it changes each time the document is updated. The ID of a PDF needs to be unique. Two PDF documents with the same ID should be exact copies of each other; both files should contain the exact same bytes in the exact same order. If the two identifiers of a document's ID pair are identical, then you know that the document is a first version. If the first identifier of two different PDF documents is identical, but the second identifier is different, then you know that both documents are somehow related to each other.

iText jargon


API: Application programming interface, a set of definitions and protocols for using external software to build software applications. A good API helps programmers make their work easier and more efficient.



Dematerialization: Decreasing reliance on physical objects for workflow processes and documentation (e.g. cloud services, OCR and paperless office).



Digital transformation: Moving processes and business from the offline to the online space, automating manual tasks and letting digital technologies meaningfully contribute to overall efficiency and business strategies.



Glyph: The smallest graphical unit to represent a character or part of a character (e.g. the many ways of rendering the character ‘a’ are all different glyphs that represent the same character).



Hash and hashing: Several meanings are possible but in iText terminology, we talk about a data derivation method and way of assuring users that their documents have not been altered since they last opened them. This is important for applications such as digital signatures and authentication.



High-level: Programming term that describes operations that contain several other operations, comparable to a shortcut that does many things for you automatically.



IDE: Integrated development environment, a generic term for the program programmers use to program (e.g. Eclipse, Visual Studio).



Kerning: Uneven spacing between glyphs in certain font to improve the flow.



Library: In software terms, one or more packages of code, sometimes used interchangeably with the term SDK (e.g. in Microsoft environments, a DLL-file is a library).



Ligature: Two or more graphemes joined in a single glyph (e.g. “æ”).



Low-level: Programming term that describes very detailed, minute operations.



Metadata: Data that usually gets appended to your document (e.g. authorship, keywords) and is always “data about the data”.



OCR: Optical character recognition (e.g. when a scanner recognizes characters from a paper document and converts them to the right glyphs).



PAdES: “PDF Advanced Electronic Signatures”, an ETSI standard for digital signatures.



PDF: Portable Document Format.



PDF/A: ISO-standard PDF format optimized for archiving.



PDF/E: ISO-standard PDF format optimized for engineering, manufacturing and construction.



PDF/UA: ISO-standard PDF format optimized for universal accessibility, intended mostly for people with visual impairments who use assistive technology to read.



PDF/X: ISO-standard PDF format optimized for physical printing.



RUPS: “Reading & Updating PDF Syntax”, a PDF diagnostic tool which is built upon iText Core and allows you to view PDF structure in a Swing GUI. Our developers use RUPS frequently to inspect and debug PDFs received from iText users and customers.



SDK: Software developer kit, basically packages of code developers can use in their projects.



Screen reader: Voice technology that reads out content to the user.



Swash: A typographical flourish for aesthetic reasons (e.g. titling).



XFA: XML Forms Architecture, static or dynamic, that iText can help flatten.



XML: Extensible markup language, can be contained within a PDF to offer non-human readers a source of important information (e.g. for automated document processes in digital invoices).



ZUGFeRD: “Zentraler User Guide des Forums elektronische Rechnung Deutschland” (“Central User Guide from the Electronic Invoice Forums of Germany”), a PDF specification that allows invoices to be processed by both human users and software.

Blockchain jargon

Distributed database: we talk about a distributed database when the storage devices aren't attached to one central processing unit, but are spread across a network. Some examples include:

  • NoSQL databases, with well-known implementations such as MongoDB and CouchDB,

  • Hadoop, which is an open source framework for storing data and running applications on clusters of hardware devices,

  • Distributed Ledger Technology,

  • ...

Node: a node is a connection point in a distributed network that can receive, create, store or send data from and to other nodes in that network.

Ledger: a ledger is a collection of permanent, final, definitive records of transactions.

Ledger record: a ledger record is an entry in the ledger containing information about one or more transactions.

Distributed ledger technology (DLT): DLT is a type of distributed database technology with the following characteristics:

  • The records can be replicated over multiple nodes in a network (decentralized environment),

  • New records can be added by each node, upon consensus reached by other nodes (ranging from one specific authoritative node to potentially every node),

  • Existing records can be validated for integrity, authenticity, and non-repudiation,

  • Existing records can't be removed, nor can their order be changed,

  • The different nodes can act as independent participants that don't necessarily need to trust each other.

Combined, these characteristics make DLT a great way to keep a ledger of records in a trustless environment.

Blockchain: blockchain is a type of DLT in which records are organized in blocks that are appended to a single chain using cryptography and distributed consensus. Each block contains a timestamp and a link to a previous block. This ensures that data in any given block can't be altered retroactively without the alteration of all subsequent blocks. This approach makes blockchain technology a good choice for the recording of events, records management, provenance tracking, and document lifecycle management.

Centralized, decentralized, or distributed storage: there are blockchain systems where a single instance of the ledger is stored on a central server that acts as the broker of the data. Usually, the data lives on different nodes. In the case of decentralized ledger storage, a copy of the ledger is stored on specific "super-nodes". In a distributed architecture, the ledger is replicated on every node.

Permissionless or permissioned blockchain: a permissionless blockchain is a DLT system where no authorization or authentication is needed, and nodes and users are unknown. In a permissioned blockchain, nodes must have a member identity; authorization and authentication is mandatory.

Public or private blockchain: in a public blockchain, any node can join to read blocks and records, append records, and to participate in the consensus mechanism. In a private blockchain only nodes that have been granted authority have that access.

Centralized, decentralized, or distributed ledger control: in case of centralized control, one authority, e.g. a central server, decides on the validation of a new block of records. With decentralized control, a central authority delegates the validation of new blocks to a limited number of nodes. In a distributed architecture, all the nodes work together using a consensus mechanism to validate a new block.

Consensus mechanism: a consensus mechanism is an agreement among all the nodes regarding the validity and consistency of the records and blocks that are being added to the blockchain. The consensus mechanism also guarantees the order of the records in a distributed ledger. A consensus mechanism can be implemented in many different ways (e.g. in the context of Bitcoin, a proof-of-work is needed), but that would go beyond the scope of this ref card.

Still have a question?

In case you can't seem to find a term or if you think you could offer an even better explanation, let us know! We'll be happy to hear from you. 

Reach out More documentation

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now