I used page.add_freetext_annot() to add Chinese text in a PDF. The text input is ‘不递交‘, but the result shows ‘不 交’. Does this issue occur due to the Chinese font not available for freetext annotation? If so, is there an update plan for adding Chinese font for freetext annotation in the future ?
I’ve had a look and indeed is seems like the “递” character does not render unless the richtext parameter is set to True.
I tried this and it worked:
import pymupdf
doc = pymupdf.open() # open a blank document
n = doc.insert_page(-1) # insert a new page
page = doc[0]
newfont="china-s"
t2="不递交"
rect = pymupdf.Rect(0,0,100,100)
page.draw_rect(rect,
color=(0, 0, 1),
fill=(1, 1, 0),
width=2)
annot = page.add_freetext_annot(rect,t2,fontsize=18,fontname=newfont,richtext=True,style="text-align:left;padding:0;margin:0;width:100px;height:100px;")
annot = page.add_freetext_annot(rect,"TEST",fontsize=18,richtext=False)
doc.save("document-with-chinese.pdf")
However, I couldn’t get the styling to align the text on the left like I expected. If I change the align from left to right it moves to the right okay. However I would expect the text to render at the start position ( like the word “TEST” does in my example).
Will need to look further into this, however I hope it helps in the meantime!
Hi Jamie, thanks for your investigations. it is very helpful.
I’ve tried your method to set (richtext=True,style=“text-align:left;font-size:6px;font-family:SimSun;padding:0;margin:0;”). It works, and ‘不递交‘ shows correctly (fig 1).
After deleting those html prefix and spaces before the text, it is align left. And I can manually resize the border as I need, and the text is kept(fig 4). I am thinking if there is a method to delete those html prefix when using richtext=True and the style .
Hi @wangli2 - unfortunately I don’t think there is a way to remove the HTML markup for the annotation as it is rich (html) text. I think we need to understand why the “递” character requires richtext=Trueto display. @HaraldLieder Any ideas?
No, not yet. I did not have the time to reproduce the case.
The MuPDF code responsible for the actual filling-in the text could use a different internal CJK font when using basic versus rich text annotations.
But needs to be investigated.