How to fix code=4: no font file for digest?

eamag · June 29, 2025, 11:47am

I’m trying to extract text from this pdf https://openreview.net/pdf?id=g90RNzs8wX using pymupdf4llm.to_markdown(pdf_path), is there a way to fix a font error? Thanks!

Jamie_Lemon · June 30, 2025, 2:07pm

Interesting, I see the error I think on page 26:
[========================================e=RuntimeError('code=4: no font file for digest')

I was running the following command:

md_text = pymupdf4llm.to_markdown("1522_Unifying_Unsupervised_Gra.pdf", page_chunks=False, extract_words=False, show_progress=True)

If I extract that page then it works. ( see my 1522_Unifying_Unsupervised_Gra-edit.pdf file )

@HaraldLieder What do you think is “wrong” with page 26 here?

1522_Unifying_Unsupervised_Gra-26.pdf (720.9 KB)
1522_Unifying_Unsupervised_Gra-edit.pdf (1.0 MB)

Jamie_Lemon · June 30, 2025, 2:08pm

Also @eamag Welcome to the forum and thanks for your post!!!

HaraldLieder · June 30, 2025, 2:28pm

This is caused by an upstream (MuPDF) problem. Recent versions of PyMuPDF4LLM make active use of MuPDF’s advanced detection of “faked” bold text. This is text written with a standard (non-bold) font such that it appears bold by writing the same text twice … with a small displacement.

This algorithm is quite complex and only works for non-Type3 fonts. The error you report currently happens because of a missing check for text in a Type 3 font.
MuPDF bug report has already been submitted.

Topic		Replies	Views
Any idea what is wrong with this PDF? PyMuPDF	6	56	July 9, 2025
Why is this diagraph NOT extracted as images by pymupdf4llm.to_markdown(write_images=True) PyMuPDF	3	26	July 18, 2025
Graphic wrongly placed in md file output from pymupdf4llm.to_markdown PyMuPDF	11	25	July 22, 2025
Why is this graphic NOT extracted as images by pymupdf4llm.to_markdown(write_images=True) PyMuPDF	5	40	July 22, 2025
Underlines not handled by pymupdf4llm.to_markdown PyMuPDF	9	60	August 13, 2025

How to fix code=4: no font file for digest?

Related topics