BUG: parameter page_chunks is ignored when passed to pymupdf4llm.to_markdown

When calling pymupdf4llm.to_markdown(doc, headers=keep_headers, footers=keep_footers, page_chunks=ret_page_chunks)

the page_chunks flag is ignored.

Looking at the source in
venv/lib/python3.11/site-packages/pymupdf4llm/init.py. line 97 its passed to

return parsed_doc.to_markdown

however in venv/lib/python3.11/site-packages/pymupdf4llm/helpers/document_layout.py
line 599. page_chunks is part of the discarded kwargs and not used.

I am using
PyMuPDF~=1.26.6

pymupdf-layout~=1.26.6

pymupdf4llm~=0.2.6

on mac and linux

btw - love PyMUPDF and you folks are great, fast responses, awesome product, its very much appreciated :slight_smile:

I’ve uploaded version 0.2.7 over the weekend. This does support again page_chunks for both to_markdown and the new to_text methods.
Please be aware, that the per-page dictionaries now contain a slightly changed set of keys:
"page_boxes" is a new list of identified and classified layout boundary boxes. They are in reading order: the "text" is in this sequence. Each list item is the tuple (x0, y0, x1, y1, "class") where class equals “table”, “picture”, “text”, …
"images"/"tables"/"words" are now omitted (= have a value of None always).
"page" has been renamed to "page_number".

1 Like

Awesome, thank you and I am grabbing now :slight_smile:

1 Like