Describe the bug
I am using pymupdf4llm to extract Markdown text from a public PDF (an Italian law document downloaded from an institutional website). When calling pymupdf4llm.to_markdown(doc, page_chunks=True), the library crashes with a ValueError: min() iterable argument is empty originating from pymupdf/table.py.
It seems that the library detects a table on a page, but the table has no cells or coordinates, causing min() to fail when get_page_output attempts to append t.bbox to omitted_table_rects.
Traceback
File "/mnt/discoD/Progetti/Sorgenti/python/gitlab/AI-CORE-v2/SERVICES/init/app/business_logic/utils/text_extraction_wrappers/text_pdf_to_markdown.py", line 13, in _extract_text_pdf_to_markdown_wrapper
texts=[page_chunk["text"] for page_chunk in pymupdf_rag.to_markdown(doc, page_chunks=True)]
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/discoD/Progetti/Sorgenti/python/gitlab/AI-CORE-v2/SERVICES/init/.venv/lib/python3.13/site-packages/pymupdf4llm/helpers/pymupdf_rag.py", line 1243, in to_markdown
parms = get_page_output(
doc,
...<5 lines>...
IGNORE_GRAPHICS,
)
File "/mnt/discoD/Progetti/Sorgenti/python/gitlab/AI-CORE-v2/SERVICES/init/.venv/lib/python3.13/site-packages/pymupdf4llm/helpers/pymupdf_rag.py", line 1063, in get_page_output
omitted_table_rects.append(pymupdf.Rect(t.bbox))
^^^^^^
File "/mnt/discoD/Progetti/Sorgenti/python/gitlab/AI-CORE-v2/SERVICES/init/.venv/lib/python3.13/site-packages/pymupdf/table.py", line 1534, in bbox
min(map(itemgetter(0), c)),
~~~^^^^^^^^^^^^^^^^^^^^^^^
ValueError: min() iterable argument is empty
Environment
- OS: Ubuntu 24.04
- Python: 3.13.12
- PyMuPdf: 1.27.2.3
- PyMuPdf4Lllm: 1.27.2.3
- PyMuPdf-Layout: 1.27.2.3
Attachments
Since this is a public law document, I am attaching the PDF file to help reproduce and debug the issue.
Additional context
I assume the issue could be mitigated by checking if the table object is empty before attempting to access its bbox property in pymupdf_rag.py.
36_2023.pdf (1,5 MB)