I do have a legacy 3rd party tool, generating a pdf with a lot of graphics and texts.
This tool can not embed fonts.
This Tool somehow removes spaces from font names.
I use “JetBrains Mono” for some texts, and want to embed this font into the pdf
So, I managed to add this font to the pdf using pymupdf.
so output from pdffonts looks like:
name type encoding emb sub uni object ID
JetBrainsMono TrueType WinAnsi no no no 50 0
Courier-Bold Type 1 WinAnsi no no no 52 0
fstroke3 TrueType WinAnsi no no no 53 0
JetBrains Mono Regular CID Type 0C (OT) Identity-H yes no yes 1739 0
But it seems not every viewer can work with this. Basically I just found chrome browser viewer showing the correct font, but maybe it has JetBrains somwhere with it and is not using the embedded version.
But this gives me the indication that “JetBrainsMono“ without spaces is somehow working as non embedded font.
Unfortunately I can not use rpl_font.py, since there are many hidden text items, the document is not readable then.
So can I embed a font for existing text at all?
At lest I can not embed a font and use an existing name.
So, I just learned, that white-spaces in PDF names are not allowed.
But “JetBrains Mono” contains white-spaces.
Maybe this should be handled when loading fonts initially (seems the legacy tool knows something here).
So I manually copied the xref and removed white-spaces:
for key in doc.xref_get_keys(xref_src):
item = doc.xref_get_key(xref_src, key)
if key == "BaseFont":
item = (item[0], item[1].replace(" ", ""))
doc.xref_set_key(53, key, item[1])
doc.xref_set_key(50, key, item[1])
So pdffonts looks like this now:
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
JetBrainsMonoRegular CID TrueType Identity-H yes no yes 50 0
Courier-Bold Type 1 WinAnsi no no no 52 0
JetBrainsMonoRegular CID TrueType Identity-H yes no yes 53 0
fstroke3 TrueType WinAnsi no no no 55 0
JetBrains Mono Regular CID TrueType Identity-H yes no yes 1741 0
But all text is gone now in the document, so not yet successful.
I will try an other font next to exclude any name issue.
I just had a look again into this, an finally I got it working.
Intention was embed a font into an existing pdf, which wasn’t embedded before,
but which is actually the same font.
The final trick was to set set_simple=True page.insert_font(fontbuffer=font.buffer, fontname=fontname, set_simple=True)
example:
import pymupdf
doc = pymupdf.open("in.pdf")
page = doc[0] # First page
fontname = "JBMONO"
fontfile = "JetBrainsMono-Regular.ttf"
font = pymupdf.Font(fontfile=fontfile)
page.insert_font(fontbuffer=font.buffer, fontname=fontname, set_simple=True)
fts = page.get_fonts(full=True)
xref_dst = []
for ft in fts:
if ft[1] == 'n/a' and ft[3].startswith("JetBrainsMono"):
xref_dst.append(ft[0])
print(f"{ft} < Found dst font: xref_dst = {ft[0]}")
elif ft[3] == "JetBrainsMono-Regular":
xref_src = ft[0]
print(f"{ft} < Found source font: xref_src = {xref_src}")
else:
print(ft)
for xref in xref_dst:
print(f"Updating font reference for xref {xref} to point to xref {xref_src}")
for key in doc.xref_get_keys(xref_src):
item = doc.xref_get_key(xref_src, key)
if key == "BaseFont":
item = (item[0], item[1].replace(" ", "")) #not sure if needed
doc.xref_set_key(xref, key, item[1])
print("\nUpdated font references:")
for ft in fts:
print(ft)
# Save the output
doc.save("out.pdf", garbage=4, deflate=True, use_objstms=1)
You cannot bring existing text in a PDF to use a different font by just replacing its font by another font copying over the new font xref over the old xref.
The reason are multiple (1) almost certainly different fonts have different glyph addresses (numbers), (2) different fonts have different glyph metrics (width, height, ascender, descender, …).
This is a path that almost never can be successful.
… Except: you are lucky enough to use exactly the same font again. In which case a justified question would be: “Why are you doing this at all?”
If setting “simple” to true was successful, then because the original creation of the text did the same [it causes using only the Unicodes up to number 255).