I am using pymupdf4llm to extract patient leaflet information inside PDF documents from the European Medicines Agency. Inside the PDF file there is a small transparent graphic showing a blue triangle with an explanation mark inside. For some reason this is not exported and not referred to in the resulting md file. Other graphics work fine. Obviously this is important safety information.
Does anybody have an idea why?
The file is : https://www.ema.europa.eu/en/documents/product-information/fiasp-epar-product-information_en.pdf (The first missing blue triangle is on page 61)
And the code I use the the one below.
import pymupdf4llm
md_text = pymupdf4llm.to_markdown(“/pleaflet/samples/fiasp-epar-product-information_en.pdf”, write_images=True, force_text=False)
now work with the markdown text, e.g. store as a UTF8-encoded file
import pathlib
pathlib.Path(“noutput.md”).write_bytes(md_text.encode())