Pymupdf unexpected result for Chinese text in freetext annotation(page.add_freetext_annot)

Hi,

I used page.add_freetext_annot() to add Chinese text in a PDF. The text input is ‘不递交‘, but the result shows ‘不 交’. Does this issue occur due to the Chinese font not available for freetext annotation? If so, is there an update plan for adding Chinese font for freetext annotation in the future ?

python code:

newfont=”china-s”

t2=”不递交”

annot = current_page.add_freetext_annot(

r,

t2,

fontsize=font_size,

fontname=newfont,

rotate=0,

text_color=black,

fill_color=filled_color,

align=pymupdf.TEXT_ALIGN_LEFT,

)

Thanks

Regards

Li

Hi @wangli2 Thank you for your post.

I’ve had a look and indeed is seems like the “递” character does not render unless the richtext parameter is set to True.

I tried this and it worked:

import pymupdf

doc = pymupdf.open() # open a blank document
n = doc.insert_page(-1) # insert a new page
page = doc[0]

newfont="china-s"

t2="不递交"

rect = pymupdf.Rect(0,0,100,100)

page.draw_rect(rect,
               color=(0, 0, 1),
               fill=(1, 1, 0),
               width=2)

annot = page.add_freetext_annot(rect,t2,fontsize=18,fontname=newfont,richtext=True,style="text-align:left;padding:0;margin:0;width:100px;height:100px;")

annot = page.add_freetext_annot(rect,"TEST",fontsize=18,richtext=False)

doc.save("document-with-chinese.pdf")

However, I couldn’t get the styling to align the text on the left like I expected. If I change the align from left to right it moves to the right okay. However I would expect the text to render at the start position ( like the word “TEST” does in my example).

Will need to look further into this, however I hope it helps in the meantime!

Hi Jamie, thanks for your investigations. it is very helpful.

I’ve tried your method to set (richtext=True,style=“text-align:left;font-size:6px;font-family:SimSun;padding:0;margin:0;”). It works, and ‘不递交‘ shows correctly (fig 1).

When I tried to resize the border of the freetext annotation to fit the text, the text disappeared(fig 2).

Then I tried to double click another annotation , I found there is some html codes before the text ‘不递交‘(fig 3).

After deleting those html prefix and spaces before the text, it is align left. And I can manually resize the border as I need, and the text is kept(fig 4). I am thinking if there is a method to delete those html prefix when using richtext=True and the style .

Thanks

Hi @wangli2 - unfortunately I don’t think there is a way to remove the HTML markup for the annotation as it is rich (html) text. I think we need to understand why the “递” character requires richtext=Trueto display. @HaraldLieder Any ideas?

No, not yet. I did not have the time to reproduce the case.
The MuPDF code responsible for the actual filling-in the text could use a different internal CJK font when using basic versus rich text annotations.
But needs to be investigated.