Bug: `ValueError: min() iterable argument is empty` in `table.bbox` when calling `to_markdown()

Describe the bug

I am using pymupdf4llm to extract Markdown text from a public PDF (an Italian law document downloaded from an institutional website). When calling pymupdf4llm.to_markdown(doc, page_chunks=True), the library crashes with a ValueError: min() iterable argument is empty originating from pymupdf/table.py.

It seems that the library detects a table on a page, but the table has no cells or coordinates, causing min() to fail when get_page_output attempts to append t.bbox to omitted_table_rects.

Traceback

File "/mnt/discoD/Progetti/Sorgenti/python/gitlab/AI-CORE-v2/SERVICES/init/app/business_logic/utils/text_extraction_wrappers/text_pdf_to_markdown.py", line 13, in _extract_text_pdf_to_markdown_wrapper
    texts=[page_chunk["text"] for page_chunk in pymupdf_rag.to_markdown(doc, page_chunks=True)]
                                                ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/discoD/Progetti/Sorgenti/python/gitlab/AI-CORE-v2/SERVICES/init/.venv/lib/python3.13/site-packages/pymupdf4llm/helpers/pymupdf_rag.py", line 1243, in to_markdown
    parms = get_page_output(
        doc,
    ...<5 lines>...
        IGNORE_GRAPHICS,
    )
  File "/mnt/discoD/Progetti/Sorgenti/python/gitlab/AI-CORE-v2/SERVICES/init/.venv/lib/python3.13/site-packages/pymupdf4llm/helpers/pymupdf_rag.py", line 1063, in get_page_output
    omitted_table_rects.append(pymupdf.Rect(t.bbox))
                                            ^^^^^^
  File "/mnt/discoD/Progetti/Sorgenti/python/gitlab/AI-CORE-v2/SERVICES/init/.venv/lib/python3.13/site-packages/pymupdf/table.py", line 1534, in bbox
    min(map(itemgetter(0), c)),
    ~~~^^^^^^^^^^^^^^^^^^^^^^^
ValueError: min() iterable argument is empty

Environment

  • OS: Ubuntu 24.04
  • Python: 3.13.12
  • PyMuPdf: 1.27.2.3
  • PyMuPdf4Lllm: 1.27.2.3
  • PyMuPdf-Layout: 1.27.2.3

Attachments

Since this is a public law document, I am attaching the PDF file to help reproduce and debug the issue.

Additional context

I assume the issue could be mitigated by checking if the table object is empty before attempting to access its bbox property in pymupdf_rag.py.

36_2023.pdf (1,5 MB)

Hi @Cristiano_Casadei !

I am unable to replicate this, I am on:

  • OS: macOS Tahoe 26.3
  • Python: 3.14.3
  • PyMuPDF: 1.27.2.3
  • PyMuPDF4LLM: 1.27.2.3
  • PyMuPDF-Layout: 1.27.2.3

I’m just doing this:

import sys
import pymupdf
import pymupdf4llm

doc = pymupdf.open(sys.argv[1])
md = pymupdf4llm.to_markdown(
    doc,
    page_chunks=True
)

print(md)

And the MD prints as expected. Could this be to do with the Python version maybe?

1 Like

No, I checked your code and it works with my version of Python too.
The problem apparently is that I use “from pymupdf4llm.helpers import pymupdf_rag”

Now I don’t remember exactly why I used that; it was probably related to some older version of pymupdf4llm?
I’ll check if the result is correct and change the code if necessary.

In the meantime, thank you for your interest and the insight you gave me!!

1 Like

You are welcome - glad to help. Yes that pymupdf_rag stuff seems like old legacy code to me! Happy coding! :slightly_smiling_face:

1 Like