Why is this graphic NOT extracted as images by pymupdf4llm.to_markdown(write_images=True)

Steen_Larsen · July 21, 2025, 3:10pm

I am using pymupdf4llm to extract patient leaflet information inside PDF documents from the European Medicines Agency. Inside the PDF file there is a small transparent graphic showing a blue triangle with an explanation mark inside. For some reason this is not exported and not referred to in the resulting md file. Other graphics work fine. Obviously this is important safety information.
Does anybody have an idea why?
The file is : https://www.ema.europa.eu/en/documents/product-information/fiasp-epar-product-information_en.pdf (The first missing blue triangle is on page 61)
And the code I use the the one below.

import pymupdf4llm

md_text = pymupdf4llm.to_markdown(“/pleaflet/samples/fiasp-epar-product-information_en.pdf”, write_images=True, force_text=False)

now work with the markdown text, e.g. store as a UTF8-encoded file

import pathlib
pathlib.Path(“noutput.md”).write_bytes(md_text.encode())

Jamie_Lemon · July 21, 2025, 3:47pm

I think this is to do with the image_size_limit parameter as defined here:

image_size_limit (float) – this must be a positive value less than 1. Images are ignored if width / page.rect.width <= image_size_limit or height / page.rect.height <=image_size_limit. For instance, the default value 0.05 means that to be considered for inclusion, an image’s width and height must be larger than 5% of the page’s width and height, respectively.

So I think if you define this as 0 then you should see it in the output.

Steen_Larsen · July 21, 2025, 5:50pm

Thanks a lot for your quick answer! Setting the parameter to 0 solved my problem. Sorry this was a RTFM problem! I will now check your documentation link! Thanks again for your help!

Jamie_Lemon · July 21, 2025, 8:30pm

No worries - happy coding!

Topic		Replies	Views
Why is this diagraph NOT extracted as images by pymupdf4llm.to_markdown(write_images=True) PyMuPDF	3	48	July 18, 2025
Some drawings missing from pymupdf4llm output PyMuPDF	3	18	March 2, 2026
BUG: list index out of range using new layout feature PyMuPDF	16	85	December 11, 2025
Graphic wrongly placed in md file output from pymupdf4llm.to_markdown PyMuPDF	11	53	July 22, 2025
BUG: pymupdf4llm list index out of range in document_layout.py PyMuPDF	9	44	December 2, 2025

Why is this graphic NOT extracted as images by pymupdf4llm.to_markdown(write_images=True)

now work with the markdown text, e.g. store as a UTF8-encoded file

Related topics