Structure recognition for information retrieval and layout
Mining for structure in a reliable, scalable way
Tables, lists and other structural elements are found in many digital articles. These elements typically allow the authors to present information in a structured manner and to communicate and summarize key results and main facts. It allows readers to get a quick overview of the presented information, to compare items and put them into context. Knowing the physical boundaries of paragraphs can aid screen-readers for visually impaired users. Having a concept of tables will help any document-processing flow. And, aside from serving as pure input, structure is a key component when performing conversion. This talk is about bridging the gap between high-level concepts and low-level document formats.
Joris Schellekens
Blockchain and Distributed Ledger Technology for documents
Blockchain is a type of DLT in which records are organized in blocks that are appended to a single chain using cryptography and distributed consensus. Each block contains a time stamp and a link to the previous block, ensuring that data in a block can't be altered retroactively. This makes blockchain a good choice for the recording of events, provenance tracking, and document life-cycle management. Signing a PDF in the blockchain instead of storing a signature in a PDF reduces the complexity of the code for a developer who needs digital signing and verification functionality. The same principle can be used in many other use cases to implement a document workflow, to keep track of the location of a document, and much more.
Joris Schellekens
Redaction in electronic documents
How I learned stop worrying and love PDF
Redacting is the removal of privileged or protected customer data before publishing or distributing. Something that has gained a lot of importance over the past few years and will continue to do so once the GDPR (General Data Protection Regulation) takes effect. While this is a trivial task for certain document formats, there are a few problems when redacting PDF documents. Redaction isn't simply removing the ability to see sensitive data, it is removing all traces of that content while retain the document properties as best as possible. This talk will delve into what redaction is for PDFs and why it is important,. We will showcase different methods of redaction, emphasizing how you can redact your content and how it applies in your business.