inspect PDF

I am trying to find the page number of a PDF object using iText's Java API.
Examples written in answer to questions such as: How to read bookmark titles?
I am working with a single PDF containing multiple documents. Each document has a bookmark. I need to read the bookmark names for a reconciliation application that I am building
I have embedded a byte array into a PDF file, more specifically an AVI file in a RichMedia annotation. Now I am trying to extract that same array. How can I do this?
I am using iTextSharp for searching internal links in a PDF file. I already have code to find external links, but I don't know what to do to find internal links...
I have an application, that extracts headings out of pdf files. The documents that the application is supposed to work with, all have more or less coherent structure and formatting. In fact, telling if a text chunk is bold or not, is very important.
I am looking for a method to extract the text as well as anchor information using iText. For example: the PDF content is "You can visit our website, XYZ , and do something" where XYZ is a clickable link. The output when extracting this content should be: "You can visit our website, XYZ (www.google.com) and do something".
Can anyone please explain me what kind of differences exist between PDF files with the same content and explain why the PDF format has this defect if I may say.
We explored many API's like Tika, PdfBox and iText to extract page numbers from a PDF file, but we weren't able to meet this requirement. In iText we tried PdfPageLabels.getPageLabels(reader) but the behavior of this method is not uniform.
While extracting font name from PDF, I get some junk characters followed by plus sign and then the font name with font style. I want to remove the junk characters. I get those junk characters only for a few PDF file, for example: MMLPEO+RemingtonNoiseless