Environment
-
pymupdfversion: 1.27.2.3 -
pymupdf4llmversion: 1.27.2.3 -
Python: 3.12.3 (also reproduced on 3.14, Windows)
-
OS: tested on Linux and Windows
Summary Simply importing pymupdf4llm — without using any of its functions — silently activates pymupdf.layout via pymupdf.layout.activate() called in pymupdf4llm/__init__.py at import time. This activation changes the behaviour of Page.find_tables() in pymupdf, returning a different number of tables on the same PDF page. The side effect is not documented and not announced via any warning.
Steps to reproduce
Same code, same input file, same pymupdf version. Only difference: presence of import pymupdf4llm.
file: D.lgs. 81/08 - Gennaio 2026 (Italian Law on occupational health and safety)
Test A — without pymupdf4llm:
python
import fitz
doc = fitz.open("Law.pdf")
for page_num in range(23, 25):
page = doc[page_num]
tables = page.find_tables()
print(f"Page {page_num + 1}: {len(tables.tables)} tables")
doc.close()
Output:
Page 24: 0 tables
Page 25: 0 tables
Test B — same code, with pymupdf4llm imported:
python
import fitz
import pymupdf4llm # only added line
doc = fitz.open("Law.pdf")
for page_num in range(23, 25):
page = doc[page_num]
tables = page.find_tables()
print(f"Page {page_num + 1}: {len(tables.tables)} tables")
doc.close()
Output:
Page 24: 1 tables
Page 25: 1 tables
Test C — confirms the cause is the layout engine:
python
import fitz
import pymupdf4llm
pymupdf4llm.use_layout(False)
doc = fitz.open("9-law.pdf")
for page_num in range(23, 25):
page = doc[page_num]
tables = page.find_tables()
print(f"Page {page_num + 1}: {len(tables.tables)} tables")
doc.close()
Output (matches Test A):
Page 24: 0 tables
Page 25: 0 tables
Root cause In pymupdf4llm/__init__.py (lines 42–48):
python
# Always attempt to use Layout by default.
try:
import pymupdf.layout
except ImportError as e:
use_layout(False)
else:
use_layout(True)
The use_layout(True) call executes pymupdf.layout.activate() at import time, globally altering pymupdf behaviour.
Why this is a problem
-
Users importing
pymupdf4llmpurely for itsto_markdownfunction may not be aware that otherpymupdffeatures they use elsewhere in the code will behave differently. -
The change is silent: no warning, no print, no documentation note in the README.
-
It violates the principle of least surprise: importing a module should not modify the runtime behaviour of another, independent module.
-
Code that worked correctly before adding
import pymupdf4llmmay start producing different results without any visible cause, which is hard to debug.
Suggested fixes (one or more)
-
Do not call
use_layout(True)automatically at import. Require an explicit call by the user. -
If automatic activation is preferred for the package’s primary use case, at least emit a
warnings.warn(...)at import time explaining the side effect. -
Document the side effect prominently in the README and in the API reference.
-
Consider scoping the layout activation to within
to_markdown/to_json/to_textcalls (activate on entry, deactivate on exit), so thatpymupdfbehaviour outside these calls remains unchanged.
Related Related to but distinct from issue #4833 in the pymupdf repository, which concerns missing documentation about pymupdf.layout and OCR. The issue here is the automatic, silent activation by pymupdf4llm.