When parsing the attached file using pymupdf.layout+pymupdf4llm the following traceback is encountered:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/pymupdf4llm/__init__.py", line 97, in to_markdown
return parsed_doc.to_markdown(
File "/usr/local/lib/python3.10/site-packages/pymupdf4llm/helpers/document_layout.py", line 672, in to_markdown
output += list_item_to_md(box.textlines, list_item_levels[i])
File "/usr/local/lib/python3.10/site-packages/pymupdf4llm/helpers/document_layout.py", line 391, in list_item_to_md
line = textlines[0]
IndexError: list index out of range
Versions:
pymupdf4llm: 0.2.4
pymupdf-layout: 1.26.6
p.s. could not attach file because it was too large. Please tell me how to send it.
If you upgrade to the latest PyMuPDF4LLM version 0.2.5, everything should work fine. This script:
import sys
from pathlib import Path
import pymupdf.layout
import pymupdf4llm
print(f"{pymupdf.version=}, {pymupdf4llm.version=}“)
doc = pymupdf.open(sys.argv[1])
md = pymupdf4llm.to_markdown(
doc,
write_images=False, # do not write image files
embed_images=False, # embed images as base64 strings
image_format=“jpg”, # image format (embedded or written)
header=True, # include/omit page headers
footer=False, # include/omit page footers
pages=None,
show_progress=True,
)
Path(doc.name).with_suffix(”.md").write_bytes(md.encode())
produces this console output:

Just one thing: if I convert multiple documents after each other, using doc=pymupdf.open(pdf_name) md_chunks = pymupdf4llm.to_markdown(doc)
then in the output I see the Full-page OCR message accumulate, so e.g. when converting first file output was
Full-page OCR on page.number=5/6.
Full-page OCR on page.number=14/15.
Full-page OCR on page.number=23/24.
After the second file the output was:
Full-page OCR on page.number=5/6.
Full-page OCR on page.number=14/15.
Full-page OCR on page.number=23/24.
Full-page OCR on page.number=0/1.
Full-page OCR on page.number=1/2.
so it includes the output lines from the first file. Did not dive deep into this yet, just wanted to mention it.