BUG: list index out of range using new layout feature

marcelrassinger · December 1, 2025, 8:47am

I am not allowed to share the document, but maybe the exception helps as well:

task_agent/document_processing/parser/pymupdf_reader.py:71: in to_markdown
md = pymupdf4llm.to_markdown(
../../../anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf4llm/init.py:83: in to_markdown
parsed_doc = parse_document(
../../../anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf4llm/init.py:42: in parse_document
return document_layout.parse_document(
../../../anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf4llm/helpers/document_layout.py:908: in parse_document
utils.clean_tables(page, blocks)
../../../anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf4llm/helpers/utils.py:261: in clean_tables
y_vals = [y_vals0[0]]

marcelrassinger · December 1, 2025, 8:56am

Versions:
- pymupdf-layout==1.26.6
- pymupdf4llm==0.2.5

The table covers more then page.

Jamie_Lemon · December 1, 2025, 10:25pm

Good to know your table spans over the page. I think this might be the same issue as: BUG: pymupdf4llm list index out of range in document_layout.py as well. Will investigate.

Jamie_Lemon · December 2, 2025, 1:50pm

I can see you are on the latest version of pymupdf4llm so it isn’t related to the other issue. Hard to test without the document. If you set show_progress to True do you know how far it gets into the document?

qbuchanan · December 2, 2025, 7:15pm

Just want to bump this up as we are experiencing the issue as well

Jamie_Lemon · December 2, 2025, 9:23pm

Hi @qbuchanan Welcome to the forum! Are you able to share your PDF? Also can you confirm your versions of PyMuPDF Layout and PyMuPDF4LLM ? ( I’m hoping 1.26.6 and 0.2.5 )

marcelrassinger · December 3, 2025, 6:44am

Hi Jamie

Attached you will find an example that should help to reproduce the issue.

Regards, Marcel

(attachments)

example.pdf (54.5 KB)

qbuchanan · December 3, 2025, 1:11pm

PyMuPDF~=1.26.6

pymupdf-layout~=1.26.6

pymupdf4llm~=0.2.5

Jamie_Lemon · December 3, 2025, 1:37pm

Thanks Marcel - the document really helps , will investigate.

Jamie_Lemon · December 3, 2025, 9:34pm

@marcelrassinger This should hopefully be fixed for you with the new version of PyMuPDF 0.2.6 (pip install pymupdf4llm==0.2.6)
@qbuchanan Perhaps you can give things a go again with the latest version? Basically there was an error with some of the object classification in the previous version which caused the issue.

Please let me know how it goes for you and if your issues are resolved!

marcelrassinger · December 4, 2025, 7:20am

Hi Jamie,

Bug is fixed, thank you!

However, there seems to be another small glitch.

I call:

md = pymupdf4llm.to_markdown(
doc=“pdf-path",
write_images=True,
image_path=“my-image-path",
embed_images=False,
)

After processing, I get images for each page in the image path (see attached zip file), but I also get one image put besides the parsed pdf file:

It looks like the logo:

This happens in all my test cases. On purpose?

Can I switch it off?

Thanks,
Marcel

(Attachment pdf_parser.zip is missing)

marcelrassinger · December 4, 2025, 7:22am

Hi Jamie,

Bug is fixed, thank you!

However, there seems to be another small glitch.

I call:

md = pymupdf4llm.to_markdown(
doc=“pdf-path",
write_images=True,
image_path=“my-image-path",
embed_images=False,
)

After processing, I get images for each page in the image path, but I also get one image put besides the parsed pdf file:

It looks like the logo:

many-csv-order-positions.pdf-0001-00.png

This happens in all my test cases. On purpose?

Can I switch it off?

Thanks,
Marcel

marcelrassinger · December 4, 2025, 9:19am

Another Bug?:

I call:

md = pymupdf4llm.to_markdown(
doc=‘storage/medidor-test.ch/email_data/gmail.com/esid_025d93140ff06a84636eee46426608433dfdd3dec4c3a9c73a9e3a095b127526/Bestellung 94833.pdf’,
write_images=True,
image_path=‘storage/medidor-test.ch/parsing_working_dir/esid_025d93140ff06a84636eee46426608433dfdd3dec4c3a9c73a9e3a095b127526/pdf_parser/md_pymupdf4llm_conversion’,
embed_images=False,
)

And I get the following exception:

File “/Users/mara/Code/agents/task_agent/task_agent/document_processing/parser/pymupdf_reader.py”, line 71, in to_markdown
md = pymupdf4llm.to_markdown(
File “/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf4llm/init.py”, line 83, in to_markdown
parsed_doc = parse_document(
File “/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf4llm/init.py”, line 42, in parse_document
return document_layout.parse_document(
File “/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf4llm/helpers/document_layout.py”, line 963, in parse_document
pix.save(layoutbox.image)
File “/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf/init.py”, line 13894, in save
return self._writeIMG(filename, idx, jpg_quality)
File “/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf/init.py”, line 13573, in writeIMG
if format == 1: mupdf.fz_save_pixmap_as_png(pm, filename)
File “/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf/mupdf.py”, line 51161, in fz_save_pixmap_as_png
return _mupdf.fz_save_pixmap_as_png(pixmap, filename)

pymupdf.mupdf.FzErrorSystem: code=2: cannot open file 'storage/medidor-test.ch/parsing_working_dir/esid_025d93140ff06a84636eee46426608433dfdd3dec4c3a9c73a9e3a095b127526/pdf_parser/md_pymupdf4llm_conversion/storage/medidor-test.ch/email_data/gmail.com/esid_025d93140ff06a84636eee46426608433dfd

Somehow the pathes get concatenated…

Do I use it incorrectly? But then, why did it work before?

Jamie_Lemon · December 5, 2025, 12:57am

@marcelrassinger I m unable to replicate your issue - I don’t think there is a character length for the image_path value. When I ask for images to be extracted they faithfully go to the folder I define. I don’t have your attached zip file so didn’t try with your “many-civ-order-positions.pdf”

marcelrassinger · December 6, 2025, 8:12am

Hi Jamie,

Below you find a simple example to reproduce the issue. Running the code results in an exception. It works, if you comment out the layout import.

import pymupdf.layout
import pymupdf4llm
md = pymupdf4llm.to_markdown(
doc=“./pdfs/example.pdf”,
write_images=True,
image_path=“./images”,
embed_images=False,
)
print(md)

The folder structure is:

The strange thing is, it also works if I move example.pdf into the same folder as the python script and set doc=“./example.pdf”.

I use Python 3.13.5

Regards, Marcel

Exception:

python pymupdf_example.py

Traceback (most recent call last):

File “/Users/mara/Downloads/test/pymupdf_example.py”, line 3, in

md = pymupdf4llm.to_markdown(

doc=“./pdfs/example.pdf”,

…<2 lines>…

embed_images=False,

)

File “/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf4llm/init.py”, line 83, in to_markdown

parsed_doc = parse_document(

doc,

…<10 lines>…

use_ocr=use_ocr,

)

File “/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf4llm/init.py”, line 42, in parse_document

return document_layout.parse_document**(**


**doc,**

**^^^^**

...<10 lines>...

**use_ocr=use_ocr,**

**^^^^^^^^^^^^^^^^**

**)**

**^**

File "/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf4llm/helpers/document_layout.py", line 963, in parse_document

pix.save**(layoutbox.image)**

~~~~~~~~**^^^^^^^^^^^^^^^^^**

File "/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf/__init__.py", line 13894, in save

return self._writeIMG**(filename, idx, jpg_quality)**

~~~~~~~~~~~~~~**^^^^^^^^^^^^^^^^^^^^^^^^^^^^**

File "/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf/__init__.py", line 13573, in _writeIMG

if format_ == 1: mupdf.fz_save_pixmap_as_png**(pm, filename)**

~~~~~~~~~~~~~~~~~~~~~~~~~~~**^^^^^^^^^^^^^^**

File "/Users/mara/anaconda3/envs/task-agent/lib/python3.13/site-packages/pymupdf/mupdf.py", line 51161, in fz_save_pixmap_as_png

return _mupdf.fz_save_pixmap_as_png**(pixmap, filename)**

~~~~~~~~~~~~~~~~~~~~~~~~~~~~**^^^^^^^^^^^^^^^^^^**

**pymupdf.mupdf.FzErrorSystem**: code=2: cannot open file './images/./pdfs/example.pdf-0001-00.png': No such file or directory

qbuchanan · December 6, 2025, 7:24pm

That appears to have fixed the issue

himanshu85 · December 11, 2025, 5:14pm

(post deleted by author)

Topic		Replies	Views
Bug: pymupdf4llm: image path handling PyMuPDF	16	84	January 20, 2026
BUG: pymupdf4llm list index out of range in document_layout.py PyMuPDF	9	49	December 2, 2025
BUG: pymupdf4llm list index out of range in document_layout.py (2) PyMuPDF	3	49	December 4, 2025
Bug: pymupdf4llm: mis-interpreted layout and IndexError on specific pages (insurance policy PDF) PyMuPDF	5	42	January 6, 2026
Pymupdf layout table detection issue PyMuPDF	14	108	February 24, 2026

BUG: list index out of range using new layout feature

Related topics