Import pymupdf4llm silently activates pymupdf.layout and changes find_tables() results

Environment

  • pymupdf version: 1.27.2.3

  • pymupdf4llm version: 1.27.2.3

  • Python: 3.12.3 (also reproduced on 3.14, Windows)

  • OS: tested on Linux and Windows

Summary Simply importing pymupdf4llm — without using any of its functions — silently activates pymupdf.layout via pymupdf.layout.activate() called in pymupdf4llm/__init__.py at import time. This activation changes the behaviour of Page.find_tables() in pymupdf, returning a different number of tables on the same PDF page. The side effect is not documented and not announced via any warning.

Steps to reproduce

Same code, same input file, same pymupdf version. Only difference: presence of import pymupdf4llm.

file: D.lgs. 81/08 - Gennaio 2026 (Italian Law on occupational health and safety)

Test A — without pymupdf4llm:

python

import fitz
doc = fitz.open("Law.pdf")
for page_num in range(23, 25):
    page = doc[page_num]
    tables = page.find_tables()
    print(f"Page {page_num + 1}: {len(tables.tables)} tables")
doc.close()

Output:

Page 24: 0 tables
Page 25: 0 tables

Test B — same code, with pymupdf4llm imported:

python

import fitz
import pymupdf4llm  # only added line
doc = fitz.open("Law.pdf")
for page_num in range(23, 25):
    page = doc[page_num]
    tables = page.find_tables()
    print(f"Page {page_num + 1}: {len(tables.tables)} tables")
doc.close()

Output:

Page 24: 1 tables
Page 25: 1 tables

Test C — confirms the cause is the layout engine:

python

import fitz
import pymupdf4llm
pymupdf4llm.use_layout(False)
doc = fitz.open("9-law.pdf")
for page_num in range(23, 25):
    page = doc[page_num]
    tables = page.find_tables()
    print(f"Page {page_num + 1}: {len(tables.tables)} tables")
doc.close()

Output (matches Test A):

Page 24: 0 tables
Page 25: 0 tables

Root cause In pymupdf4llm/__init__.py (lines 42–48):

python

# Always attempt to use Layout by default.
try:
    import pymupdf.layout
except ImportError as e:
    use_layout(False)
else:
    use_layout(True)

The use_layout(True) call executes pymupdf.layout.activate() at import time, globally altering pymupdf behaviour.

Why this is a problem

  1. Users importing pymupdf4llm purely for its to_markdown function may not be aware that other pymupdf features they use elsewhere in the code will behave differently.

  2. The change is silent: no warning, no print, no documentation note in the README.

  3. It violates the principle of least surprise: importing a module should not modify the runtime behaviour of another, independent module.

  4. Code that worked correctly before adding import pymupdf4llm may start producing different results without any visible cause, which is hard to debug.

Suggested fixes (one or more)

  1. Do not call use_layout(True) automatically at import. Require an explicit call by the user.

  2. If automatic activation is preferred for the package’s primary use case, at least emit a warnings.warn(...) at import time explaining the side effect.

  3. Document the side effect prominently in the README and in the API reference.

  4. Consider scoping the layout activation to within to_markdown / to_json / to_text calls (activate on entry, deactivate on exit), so that pymupdf behaviour outside these calls remains unchanged.

Related Related to but distinct from issue #4833 in the pymupdf repository, which concerns missing documentation about pymupdf.layout and OCR. The issue here is the automatic, silent activation by pymupdf4llm.

Welcome @Pierpaolo_Ferrante !

Thanks for the feedback. The goal of any software project is to improve - better table detection, smarter OCR decisions and more accurate layout handling - while outperforming other tools in terms of processing speed and resource requirements - are exactly the kind of progress we want to make.

The only point that can feel surprising is that we still keep the older, less capable behavior around. That’s intentional: the non‑layout mode has even lower resource requirements and some users depend on that or need more time to adjust their pipelines. The real issue isn’t the behavior - we might have communicated this more prominently and maybe more exhustively.

Our plan is simple:

  • Keep improving the layout engine as a priority, and keep it activated by default.
  • Keep the lightweight mode of PyMuPDF4LLM available for some time for those who need it, and document the differences clearly so no one is caught by surprise.