Bug: pymupdf4llm: image path handling

Running the code below results in an exception (full exception see end of entry)

pymupdf.mupdf.FzErrorSystem: code=2: cannot open file './images/./pdfs/example.pdf-0001-00.png': No such file or directory

It works, if you comment out the layout import.

import pymupdf.layout
import pymupdf4llm

md = pymupdf4llm.to_markdown(
doc=“./pdfs/example.pdf”,
write_images=True,
image_path=“./images”,
embed_images=False,
)

The folder structure is:

Another strage oberservation: it also works if I move example.pdf into the same folder as the python script and set doc=“./example.pdf”.

Versions:

  • Python 3.13.5
  • pymupdf-layout 1.26.6
  • pymupdf4llm 0.2.7

Exception:


python pymupdf_example.py 

Traceback (most recent call last):

  File "/Users/mara/Downloads/test/pymupdf_example.py", line 3, in <module>

    md = pymupdf4llm.to_markdown(

        doc="./pdfs/example.pdf",

    ...<2 lines>...

        embed_images=False,

    )

  File "/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf4llm/__init__.py", line 83, in to_markdown

    parsed_doc = parse_document(

        doc,

    ...<10 lines>...

        use_ocr=use_ocr,

    )

  File "/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf4llm/__init__.py", line 42, in parse_document

    return document_layout.parse_document(

           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^

        doc,

        ^^^^

    ...<10 lines>...

        use_ocr=use_ocr,

        ^^^^^^^^^^^^^^^^

    )

    ^

  File "/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf4llm/helpers/document_layout.py", line 1021, in parse_document

    pix.save(layoutbox.image)

    ~~~~~~~~^^^^^^^^^^^^^^^^^

  File "/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf/__init__.py", line 13894, in save

    return self._writeIMG(filename, idx, jpg_quality)

           ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf/__init__.py", line 13573, in _writeIMG

    if   format_ == 1:  mupdf.fz_save_pixmap_as_png(pm, filename)

                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^

  File "/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf/mupdf.py", line 51161, in fz_save_pixmap_as_png

    return _mupdf.fz_save_pixmap_as_png(pixmap, filename)

           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^

pymupdf.mupdf.FzErrorSystem: code=2: cannot open file './images/./pdfs/example.pdf-0001-00.png': No such file or directory

@Jamie_Lemon : Hi Jamie, is it possible to find out whether a bug ( Bug: pymupdf4llm: image path handling ) will make it into the backlog or not? It would help me planning. If you consider this bug is not important enough I could start looking for alternatives otherwise I would wait if it is planned to be fixed.

@marcelrassinger We should have a fix ready for this (and are just testing it) in the next release of PyMuPDF4LLM (0.2.8) which I think will be released within the next couple of days or early next week. Thanks for the report - I will ping you here as soon as it makes its way onto PyPI!

That is great news, thanks!

@Jamie_Lemon : Hi Jamie, it seems that releasing is more challenging than I expected - I guess. Do you have new plans when to get 0.2.8 released?

Thanks, Marcel

Hi @marcelrassinger I am pretty confident that 0.2.8 will be released by the end of this week or the beginning of next week. It will happen soon!

Hi @marcelrassinger Please note PyMuPDF4LLM 0.2.8 is now available. Happy New Year!

Hi Jamie

Thanks and a happy new year too!

Most things work now with 0.2.8 but in one case I get still an exception. Below you find code and the exception. Attached you find the pdf.
Again, it works if I comment out pymupdf.layout.

Regards,
Marcel

The test code:

import pymupdf.layout
import pymupdf4llm
import os

print(f"Current directory: {os.getcwd()}")

md = pymupdf4llm.to_markdown(
doc=“./pdf/example.pdf”,
write_images=True,
image_path=“./images”,
embed_images=False,
)

print(md)

The dir structure is:

image.png

The exception I got:
Traceback (most recent call last):
File “/Users/mara/Code/agents/task_agent/customer-todos/pymupdftest/test.py”, line 7, in
md = pymupdf4llm.to_markdown(
doc=“./pdf/example.pdf”,
…<2 lines>…
embed_images=False,
)
File “/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf4llm/init.py”, line 86, in to_markdown
parsed_doc = parse_document(
doc,
…<11 lines>…
ocr_language=ocr_language,
)
File “/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf4llm/init.py”, line 43, in parse_document
return document_layout.parse_document(

doc,
^^^^
...<11 lines>...
ocr_language=ocr_language,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf4llm/helpers/document_layout.py", line 1061, in parse_document
pix.save(save_img_filename)
~~~~~~~~^^^^^^^^^^^^^^^^^^^
File "/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf/__init__.py", line 13894, in save
return self._writeIMG(filename, idx, jpg_quality)
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf/__init__.py", line 13573, in _writeIMG
if format_ == 1: mupdf.fz_save_pixmap_as_png(pm, filename)
~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
File "/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf/mupdf.py", line 51161, in fz_save_pixmap_as_png
return _mupdf.fz_save_pixmap_as_png(pixmap, filename)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
pymupdf.mupdf.FzErrorSystem: code=2: cannot open file 'images/pdf/example.pdf-0001-01.png': No such file or directory

[details="(attachments)"]

[example.pdf|attachment](upload://bMbfl9IMa5N41wjHinrLZtApgsp.pdf) (93.9 KB)

[/details]

Yes - I can reproduce this problem - it is not specific to the PDF rather it seems that there is some broken logic to do with saving images out when the source PDF is in a sub folder path when pymupdf.layout is imported. As you have also confirmed, it works for me as expected when I omit pymupdf.layout.

@HaraldLieder @Robin_Watts I have created an issue for this on the internal Github tracker for your consideration.

@marcelrassinger It could be that we need to release an update for the PyMuPDF Layout package, or the PyMuPDF4LLM package - I’m not sure at this stage but will keep you posted!

@marcelrassinger It looks like we have an update here which should resolve your problem with PyMuPDF4LLM version 0.2.9 - please let me know how it works for you!

This should now have finally been fixed in version 0.2.9 of PyMuPDF4LLM.

Hi Jamie, it works fine now, thanks! And sorry for the late answer..

1 Like

@marcelrassinger No worries! Glad it works now and thanks for your feedback!

@Jamie_Lemon Hi Jamie! Following this post, may I ask if there is any plan to customize the filenames of the extracted images? Now the images seem to all follow the format: --.png by default. It suffices to make image filenames unique. I wonder if it is better to make it customizable or simply remove the “.pdf” (if any) so that we won’t have two “.“ (i.e., one from .pdf, the other from .png, e.g., abc.pdf-0001-00.png) in the image filename?

Hi @Joseph_Bai ! This seems like a reasonable idea, another parameter I guess, something like image_filename_prefix.
@HaraldLieder What do you think of this kind of enhancement?

1 Like

We already support afilename parameter. It can be used if the doc is in memory (and hence doc.name=""), but it can also be used for a file-based document. So put what you like in filename and you have what you want.

1 Like

Thank you for your and @Jamie_Lemon ‘s reply! I overlooked that parameter, that’s exactly what I was looking for!

1 Like