How to get the page number of an arbitrary PDF object?

I am trying to find the page number of a PDF object using iText's Java API. The following code reads in the PDF file, and gets the object containing the open action. How do I get the page number of

I am trying to find the page number of a PDF object using iText's Java API. The following code reads in the PDF file, and gets the object containing the open action. How do I get the page number of that object?

PdfReader soPdfItext = null;
try {
    soPdfItext = new PdfReader(new FileInputStream(f));
} catch (IOException e) { }
/* Get the catalog */
PdfDictionary soCatalog = soPdfItext.getCatalog();
/* Get the object referring to the open action */
PRIndirectReference soOpenActionReference =
    (PRIndirectReference) soCatalog.get(PdfName.OPENACTION);
/* Get the actual object containing the open action */
PdfObject soOpenActionObject =
    originalPdfItext.getPdfObject(soOpenActionReference.getNumber());

Now what? There is a class Document that contains a method getPageNumber(), but I'm not sure if a) it's relevant to what I want to do and b) if it is relevant, how to implement.

Posted on StackOverflow on Jun 15, 2015 by user271621

There are no such things as page numbers in a PDF. Pages are part of a page tree. This page tree consists of /Pages elements (the branches of the tree) and /Page elements (the leaves of the tree). The page index is calculated by traversing the different branches and leaves of the tree. Optionally, a PDF also defines /PageLabels. If you know the page index and if you have the definition of the page labels, you can derive the page number.

You are extracting an PdfObject that represents an open action. It can be a PdfDictionary or a PdfArray.

PdfDictionary

If the PdfObject is an instance of a PdfDictionary, then you need to look at the /S item of this dictionary to find out which type of action will be triggered.

  • That action could be some JavaScript. If that JavaScript contains an action that jumps to a specific page, there might be a page number in that method.

  • That action could be a GoTo action, in which case you need to look at the /D entry for the destination (*).

There are 20 possible types of actions, and actions can be chained, so it's up to you to loop through the action chain and to examine every possible action.

This is an example:

/OpenAction>

The and >> indicate that the open action is described using a dictionary. The /S shows that you have a /GoTo action and /D describes the destination.

PdfArray

If the PdfAction is an instance of a PdfArray, then this array is a destination (*).

This is an example:

/OpenAction[6 0 R/XYZ 0 806 0]

(*) Destination

A destination is an array that consists of a variable number of elements. These are some examples:

[8 0 R/Fit]
[6 0 R/XYZ 0 806 0]

The first example is an array with two elements 8 0 R and /Fit. The second example is an array with four elements 6 0 R, /XYZ, 0, 806 and 0. You need the first element. It doesn't give you the page number (because there is no such thing as page numbers), but it gives you a reference to the /Page object. Based on that reference, you can deduce the page number by looping over the page tree and comparing the object number of a specific page with the object number in the destination.


Share this article

Ready to use iText?

Try our iText 7 Library and add-ons FREE for 30 days. Test your proof of concept, and see if our solution is right for you.

Get my FREE trial
Contact

Still have questions? 

We're happy to answer your questions. Reach out to us and we'll get back to you shortly.

Contact us
Stay updated

Join 11,000+ subscribers and become an iText PDF expert by staying up to date with our new products, updates, tips, technical solutions and happenings.

Subscribe Now