BUG: pymupdf4llm list index out of range in document_layout.py (2)

robvd · December 3, 2025, 9:51am

I stumbled on another list index out of range. When parsing a large file using pymupdf.layout+pymupdf4llm the following traceback is encountered:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/pymupdf4llm/__init__.py", line 83, in to_markdown
    parsed_doc = parse_document(
  File "/usr/local/lib/python3.10/site-packages/pymupdf4llm/__init__.py", line 42, in parse_document
    return document_layout.parse_document(
  File "/usr/local/lib/python3.10/site-packages/pymupdf4llm/helpers/document_layout.py", line 908, in parse_document
    utils.clean_tables(page, blocks)
  File "/usr/local/lib/python3.10/site-packages/pymupdf4llm/helpers/utils.py", line 261, in clean_tables
    y_vals = [y_vals0[0]]
IndexError: list index out of range

Versions:

pymupdf4llm: 0.2.5

pymupdf-layout: 1.26.6

The commands used were:

doc=pymupdf.open(pdf_name)
md_chunks = pymupdf4llm.to_markdown(doc)

The size of the PDF file is 142MB so I cannot upload it here.

p.s. these files belong to the open data of the Dutch government and are important to parse. Unfortunately there is a great variety in quality and size of these files. On the other hand, they are great test cases

HaraldLieder · December 3, 2025, 5:26pm

This problem should have been fixed in pymupdf4llm version 0.2.6.

Jamie_Lemon · December 3, 2025, 9:47pm

@robvd Are you able to share the open data link to the PDFs maybe? Hoping indeed that the new PyMuPDF4LLM 0.2.6 resolves your issue, at least it resolved the similar issue here: BUG: list index out of range using new layout feature - #10 by Jamie_Lemon

robvd · December 4, 2025, 8:19am

It is indeed working with version 0.2.6.

@Jamie_Lemon I had stored this file locally once because it caused trouble - unfortunately I did not save the original url. If you want I can send the file e.g. using WeTransfer, just dm me your email address.

Topic		Replies	Views
BUG: pymupdf4llm list index out of range in document_layout.py PyMuPDF	9	42	December 2, 2025
BUG: list index out of range using new layout feature PyMuPDF	16	85	December 11, 2025
Bug: pymupdf4llm: mis-interpreted layout and IndexError on specific pages (insurance policy PDF) PyMuPDF	5	31	January 6, 2026
Pymupdf layout table detection issue PyMuPDF	14	66	February 24, 2026
BUG: double column pdfs text extracted in wrong order PyMuPDF	2	39	January 16, 2026

BUG: pymupdf4llm list index out of range in document_layout.py (2)

Related topics