It’s been quite a while since we’ve done one of these. At iText our developers often investigate and resolve problems our users encounter with PDFs and a few years ago, we presented a PDF Days talk entitled “7 cases for the PDF detectives” which detailed a variety of mysterious PDF issues that we’ve solved over the years.
We’ve decided to resurrect the series as we recently resolved an issue for a customer where an interactive form exhibited some mysterious properties. We thought it may be helpful for others who run into similar problems to describe the issues in detail, and how we overcame them.
The methods described will work for both AGPL and commercial iText 7 users, but to ensure correct rendering of the example text in all different languages used in the following code samples, the commercially licensed pdfCalligraph add-on is required to avoid rendering issues such as the Arabic being written left-to-right, rather than right-to-left as it should be. pdfCalligraph was developed to resolve such issues, supports a vast number of languages and writing systems, and much more. If you don't have a commercial license, you can get a free trial of the entire iText 7 Suite including the iText 7 Core library, plus all its add-ons.
Presenting the case
This customer was using iText 7 to programmatically fill fields in an interactive form (or Acroform as they are commonly called), and the issue they were having is that they needed to be able to enter data into form fields at the end of a PDF in more than one language, in this case Russian and Arabic. However, they found that when they filled out the form strange things were happening:
- When the form was filled out but not flattened, when opening the resulting PDF in Acrobat only the Russian text was visible, and the Arabic text was not. However, clicking the form field caused the Arabic text to appear.
- However, if the form was flattened, opening the PDF somehow meant neither language’s text could be displayed.
Unsurprisingly, this was not the behavior the customer expected. Text in a different language that only appeared when a field was clicked was one thing, but both languages disappearing completely after flattening was quite another. To make matters worse, the customer told us that they tried the same thing in iText 5, and it had worked perfectly.
So, here we have not one but three mysteries. Why was one language displayed but not both? Why did they both disappear once the form was flattened? And why was iText 5 seemingly immune to both issues?
Examining the evidence
The customer told us that they had added the required language fonts in iText 5 using the AcroFields.addSubstitutionFont API but could not find an equivalent API in iText 7. This specific workflow for using substitution fonts was no longer available in iText 7 because the PDF specifications for 1.7 and 2.0 actually limit the visual representation of a field to a single style. As Chapter 8 – Interactive Features of the PDF 1.7 Reference notes regarding form fields:
The field’s text is held in a text string (or, beginning with PDF 1.5, a stream) in the V (value) entry of the field dictionary. The contents of this text string or stream shall be used to construct an appearance stream for displaying the field, as described under 12.7.3.3, “Variable Text.” The text shall be presented in a single style (font, size, color, and so forth), as specified by the DA (default appearance) string.
As for the problem of the Arabic text only appearing once the field had been selected, then because the specific language’s font had not been substituted then the font used in the document could not render the text correctly. When the field was clicked, this caused Acrobat to regenerate its appearance, and substitute a compatible font containing the required glyphs.
The solution
So not providing the form with compatible fonts was indeed the issue. To resolve it we would need to check the values passed to the form and then assign a font based on the value. Our support team produced the following code example for the customer which uses the FontSelector
(Java/.NET) and FontProvider
(Java/.NET) classes to load in the required fonts for English, Arabic, Russian and Chinese languages to populate the form field.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
public void createPdf(String DEST) throws IOException, IOException {
PdfWriter writer = new PdfWriter(DEST);
PdfDocument pdfDoc = new PdfDocument(writer);
Document doc = new Document(pdfDoc);
FontProvider provider = new FontProvider();
//Load in fonts in the directory into the provider
provider.addDirectory("./fonts");
//Build a priority list of fontfamilies to use
List<String> fontfamilies = new ArrayList<>();
fontfamilies.add("Noto Sans Blk"); //Found in font file using font forge
fontfamilies.add("Noto Naskh Arabic");
fontfamilies.add("Open Sans");
fontfamilies.add("ZCOOL QingKe HuangYou");
// 1 English font
String textEnglish = "The quick brown fox jumps over the lazy dog.?!";
Paragraph paraEnglish = new Paragraph(textEnglish);
//Build a strategy based on the string and priority list
FontSelectorStrategy strategy = provider.getStrategy(textEnglish,fontfamilies);
strategy.nextGlyphs(); //Make the strategy choose for the first string of significant glyphs
PdfFont englishFont = strategy.getCurrentFont(); //Since our string is mono-language, we can extract
//the used font after a single run of nextGlyphs
paraEnglish.setFont(englishFont);
doc.add(paraEnglish);
//Arabic font
String textArabic = "الثعلب البني السريع يقفز فوق الكلب الكسول.؟!";
Paragraph paraArabic = new Paragraph(textArabic);
strategy = provider.getStrategy(textArabic,fontfamilies);
strategy.nextGlyphs(); //Make the strategy choose
PdfFont arabicFont = strategy.getCurrentFont();
paraArabic.setFont(arabicFont);
doc.add(paraArabic);
//Cyrillic font
String textRussian = "Быстрая коричневая лиса прыгает через ленивую собаку.?!";
Paragraph paraRussian = new Paragraph(textRussian);
strategy = provider.getStrategy(textRussian,fontfamilies);
strategy.nextGlyphs(); //Make the strategy choose
PdfFont russianFont = strategy.getCurrentFont();
paraRussian.setFont(russianFont);
doc.add(paraRussian);
//Chinese font
String textChinese = "敏捷的棕色狐狸跳过了懒狗。?!";
Paragraph paraChinese = new Paragraph(textChinese);
strategy = provider.getStrategy(textChinese,fontfamilies);
strategy.nextGlyphs(); //Make the strategy choose
PdfFont chineseFont = strategy.getCurrentFont();
paraChinese.setFont(chineseFont);
doc.add(paraChinese);
doc.close();
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
public void CreatePdf(String dest)
{
PdfWriter writer = new PdfWriter(dest);
PdfDocument pdfDoc = new PdfDocument(writer);
Document doc = new Document(pdfDoc);
FontProvider provider = new FontProvider();
//Load in fonts in the directory into the provider
provider.AddDirectory("./fonts");
String path = Directory.GetCurrentDirectory();
Console.WriteLine("The current directory is {0}", path);
//Build a priority list of fontfamilies to use
IList fontfamilies = new List();
fontfamilies.Add("Noto Sans Blk"); //Found in font file using font forge
fontfamilies.Add("Noto Naskh Arabic");
fontfamilies.Add("Open Sans");
fontfamilies.Add("ZCOOL QingKe HuangYou");
// 1 English font
String textEnglish = "The quick brown fox jumps over the lazy dog.?!";
Paragraph paraEnglish = new Paragraph(textEnglish);
//Build a strategy based on the string and priority list
FontSelectorStrategy strategy = provider.GetStrategy(textEnglish, fontfamilies);
strategy.NextGlyphs(); //Make the strategy choose for the first string of significant glyphs
PdfFont
englishFont =
strategy
.GetCurrentFont(); //Since our string is mono-language, we can extract
//the used font after a single run of nextGlyphs
paraEnglish.SetFont(englishFont);
doc.Add(paraEnglish);
//Arabic font
String textArabic = "الثعلب البني السريع يقفز فوق الكلب الكسول.؟!";
Paragraph paraArabic = new Paragraph(textArabic);
strategy = provider.GetStrategy(textArabic, fontfamilies);
strategy.NextGlyphs(); //Make the strategy choose
PdfFont arabicFont = strategy.GetCurrentFont();
paraArabic.SetFont(arabicFont);
doc.Add(paraArabic);
//Cyrillic font
String textRussian = "Быстрая коричневая лиса прыгает через ленивую собаку.?!";
Paragraph paraRussian = new Paragraph(textRussian);
strategy = provider.GetStrategy(textRussian, fontfamilies);
strategy.NextGlyphs(); //Make the strategy choose
PdfFont russianFont = strategy.GetCurrentFont();
paraRussian.SetFont(russianFont);
doc.Add(paraRussian);
//Chinese font
String textChinese = "敏捷的棕色狐狸跳过了懒狗。?!";
Paragraph paraChinese = new Paragraph(textChinese);
strategy = provider.GetStrategy(textChinese, fontfamilies);
strategy.NextGlyphs(); //Make the strategy choose
PdfFont chineseFont = strategy.GetCurrentFont();
paraChinese.SetFont(chineseFont);
doc.Add(paraChinese);
doc.Close();
}
This produces a PDF which contains a form field displaying text in each of the four languages.
Case closed?
The customer responded that this example had indeed solved the issues they were experiencing, however they wanted to know if there was a way to display different languages in a form field as part of the same text string. The example they gave where this could be valid is a person filling out an address field might use Arabic and Cyrillic characters, but you might be able to think of other cases.
This problem was a little trickier to resolve, considering the field requirements noted in the PDF specification above. However, font retrieval for multi-language/multi-script strings is already present in the example above, you just need to be able to present a string that already contains all the required glyphs.
Solving the subsequent problem
The following example shows a solution which builds a composite sentence as a string containing the required text and fonts in order to populate the field:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
public static void main(String[] args) throws IOException {
FontProviderAndFormFieldExample app = new FontProviderAndFormFieldExample();
app.createPdf(DEST);
app.fillExample(exampleDest, DEST_FILLED);
}
public class FontProviderAndFormFieldExample {
public String FIELDNAME = "test";
public Rectangle FIELDRECT = new Rectangle(50,300,300,300);
public String FIELDVALUE = "The quick brown fox jumps over the lazy dog.?! " +
"الثعلب البني السريع يقفز فوق الكلب الكسول.؟!" +
" Быстрая коричневая лиса прыгает через ленивую собаку." +
"?!敏捷的棕色狐狸跳过了懒狗。?!";
public void createPdf(String dest) throws IOException, IOException {
PdfWriter writer = new PdfWriter(dest);
PdfDocument pdfDoc = new PdfDocument(writer);
Document doc = new Document(pdfDoc);
Paragraph para = new Paragraph("Test document for multi-font appearance in a text formfield");
doc.add(para);
PdfAcroForm acroForm = PdfAcroForm.getAcroForm(pdfDoc,true);
PdfFormField ff = PdfFormField.createText(pdfDoc,FIELDRECT,"test",FIELDNAME);
acroForm.addField(ff,pdfDoc.getFirstPage());
PdfCanvas pdfCanvas = new PdfCanvas(pdfDoc.getFirstPage());
pdfCanvas.setLineWidth(1f).setStrokeColor(ColorConstants.BLUE).rectangle(FIELDRECT).stroke();
doc.close();
}
public void fillExample(String src, String dest, String srcf) throws IOException {
PdfReader reader = new PdfReader(src);
PdfWriter writer = new PdfWriter(dest);
PdfDocument pdfDoc = new PdfDocument(reader,writer);
PdfAcroForm acroForm = PdfAcroForm.getAcroForm(pdfDoc,true);
PdfFormField ff = acroForm.getField(FIELDNAME);
fillFormfield(FIELDVALUE,ff,pdfDoc,srcf);
pdfDoc.close();
}
public void fillFormfield(String value, PdfFormField ff, PdfDocument pdfDoc, String srcf){
ff.setValue(value);
Rectangle rect = ff.getPdfObject().getAsRectangle(PdfName.Rect);
PdfStream appearance = craftAppearanceDictionary(value,new Rectangle(rect.getWidth(),rect.getHeight()),srcf,pdfDoc);
PdfDictionary appearanceDic = new PdfDictionary();
appearanceDic.put(PdfName.N,appearance);
ff.put(PdfName.AP, appearanceDic);
}
public PdfStream craftAppearanceDictionary(String s, Rectangle bBox, String srcf, PdfDocument pdfDoc){
PdfStream app = new PdfStream();
PdfFormXObject formXObject = new PdfFormXObject(app);
formXObject.setBBox(new PdfArray(bBox));
Canvas canvas = new Canvas(formXObject,pdfDoc);
Paragraph para = new Paragraph();
FontProvider provider = new FontProvider();
//Load in fonts in the directory into the provider
provider.addDirectory(SRCF);
//Build a priority list of fontfamilies to use
List<String> fontfamilies = new ArrayList<>();
fontfamilies.add("Noto Sans Blk"); //Found in font file using font forge
fontfamilies.add("Noto Naskh Arabic");
fontfamilies.add("Open Sans");
fontfamilies.add("ZCOOL QingKe HuangYou");
FontSelectorStrategy strategy = provider.getStrategy(s,fontfamilies);
String currentSubsString = s;
while(true){
List<Glyph> nextGlyphs = strategy.nextGlyphs();
int nrOfGlyphs = nextGlyphs.size();
if(nrOfGlyphs <=0){
break;
}
PdfFont nextFont = strategy.getCurrentFont();
Text nextText = new Text(currentSubsString.substring(0,nrOfGlyphs));
nextText.setFont(nextFont);
para.add(nextText);
currentSubsString = currentSubsString.substring(nrOfGlyphs);
}
canvas.add(para);
return app;
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
static void Main(string[] args)
{
FontProviderAndFormFieldExample app = new FontProviderAndFormFieldExample();
app.createPdf(DEST);
app.fillExample(exampleDest, DEST_FILLED);
}
public class FontProviderAndFormFieldExample
{
public static String FIELDNAME = "test";
public static Rectangle FIELDRECT = new Rectangle(50, 300, 300, 300);
public static String FIELDVALUE = "The quick brown fox jumps over the lazy dog.?! " +
"الثعلب البني السريع يقفز فوق الكلب الكسول.؟!" +
" Быстрая коричневая лиса прыгает через ленивую собаку." +
"?!敏捷的棕色狐狸跳过了懒狗。?!";
public void createPdf(String dest)
{
PdfWriter writer = new PdfWriter(dest);
PdfDocument pdfDoc = new PdfDocument(writer);
Document doc = new Document(pdfDoc);
Paragraph para = new Paragraph("Test document for multi-font appearance in a text formfield");
doc.Add(para);
PdfAcroForm acroForm = PdfAcroForm.GetAcroForm(pdfDoc, true);
PdfFormField ff = PdfFormField.CreateText(pdfDoc, FIELDRECT, "test", FIELDNAME);
acroForm.AddField(ff, pdfDoc.GetFirstPage());
PdfCanvas pdfCanvas = new PdfCanvas(pdfDoc.GetFirstPage());
pdfCanvas.SetLineWidth(1f).SetStrokeColor(ColorConstants.BLUE).Rectangle(FIELDRECT).Stroke();
doc.Close();
}
public void fillExample(String src, String dest)
{
PdfReader reader = new PdfReader(src);
PdfWriter writer = new PdfWriter(dest);
PdfDocument pdfDoc = new PdfDocument(reader, writer);
PdfAcroForm acroForm = PdfAcroForm.GetAcroForm(pdfDoc, true);
PdfFormField ff = acroForm.GetField(FIELDNAME);
fillFormfield(FIELDVALUE, ff, pdfDoc);
pdfDoc.Close();
}
public void fillFormfield(String value, PdfFormField ff, PdfDocument pdfDoc)
{
ff.SetValue(value);
Rectangle rect = ff.GetPdfObject().GetAsRectangle(PdfName.Rect);
PdfStream appearance =
craftAppearanceDictionary(value, new Rectangle(rect.GetWidth(), rect.GetHeight()), pdfDoc);
PdfDictionary appearanceDic = new PdfDictionary();
appearanceDic.Put(PdfName.N, appearance);
ff.Put(PdfName.AP, appearanceDic);
}
public PdfStream craftAppearanceDictionary(String s, Rectangle bBox, PdfDocument pdfDoc)
{
PdfStream app = new PdfStream();
PdfFormXObject formXObject = new PdfFormXObject(app);
formXObject.SetBBox(new PdfArray(bBox));
Canvas canvas = new Canvas(formXObject, pdfDoc);
Paragraph para = new Paragraph();
FontProvider provider = new FontProvider();
//Load in fonts in the directory into the provider
provider.AddDirectory("./fonts");
//Build a priority list of fontfamilies to use
IList fontfamilies = new List();
fontfamilies.Add("Noto Sans Blk"); //Found in font file using font forge
fontfamilies.Add("Noto Naskh Arabic");
fontfamilies.Add("Open Sans");
fontfamilies.Add("ZCOOL QingKe HuangYou");
FontSelectorStrategy strategy = provider.GetStrategy(s, fontfamilies);
String currentSubsString = s;
while (true)
{
IList nextGlyphs = strategy.NextGlyphs();
int nrOfGlyphs = nextGlyphs.Count;
if (nrOfGlyphs <= 0)
{
break;
}
PdfFont nextFont = strategy.GetCurrentFont();
Text nextText = new Text(currentSubsString.Substring(0, nrOfGlyphs));
nextText.SetFont(nextFont);
para.Add(nextText);
currentSubsString = currentSubsString.Substring(nrOfGlyphs);
}
canvas.Add(para);
return app;
}
}
As you can see from the comments in the code, it makes use of subsequent calls of strategy.nextGlyphs()
to give you a new current font as long as the String
contains characters that could not be resolved using the previous font. This results in a form field that looks like this, where the text in each language has been combined into a single string:
That concludes our case for now, but we hope you’ve found it interesting and/or useful if you're trying to solve a similar PDF mystery.