BUG: parameter page_chunks is ignored when passed to pymupdf4llm.to_markdown

qbuchanan · December 6, 2025, 7:34pm

When calling pymupdf4llm.to_markdown(doc, headers=keep_headers, footers=keep_footers, page_chunks=ret_page_chunks)

the page_chunks flag is ignored.

Looking at the source in
venv/lib/python3.11/site-packages/pymupdf4llm/init.py. line 97 its passed to

return parsed_doc.to_markdown

however in venv/lib/python3.11/site-packages/pymupdf4llm/helpers/document_layout.py
line 599. page_chunks is part of the discarded kwargs and not used.

I am using
PyMuPDF~=1.26.6

pymupdf-layout~=1.26.6

pymupdf4llm~=0.2.6

on mac and linux

btw - love PyMUPDF and you folks are great, fast responses, awesome product, its very much appreciated

HaraldLieder · December 8, 2025, 1:38pm

I’ve uploaded version 0.2.7 over the weekend. This does support again page_chunks for both to_markdown and the new to_text methods.
Please be aware, that the per-page dictionaries now contain a slightly changed set of keys:
"page_boxes" is a new list of identified and classified layout boundary boxes. They are in reading order: the "text" is in this sequence. Each list item is the tuple (x0, y0, x1, y1, "class") where class equals “table”, “picture”, “text”, …
"images"/"tables"/"words" are now omitted (= have a value of None always).
"page" has been renamed to "page_number".

qbuchanan · December 8, 2025, 3:18pm

Awesome, thank you and I am grabbing now

Topic		Replies	Views
BUG: pymupdf4llm list index out of range in document_layout.py PyMuPDF	9	76	December 2, 2025
Problem with pymupdf4llm.to_markdown PyMuPDF	2	72	March 17, 2026
To_markdown only producing header tags (and no text), to_json produces correct text from spans PyMuPDF	12	77	May 6, 2026
BUG: pymupdf4llm list index out of range in document_layout.py (2) PyMuPDF	3	73	December 4, 2025
BUG: list index out of range using new layout feature PyMuPDF	16	105	December 11, 2025

BUG: parameter page_chunks is ignored when passed to pymupdf4llm.to_markdown

Related topics