Hi there,
I am running into an issue where we modify the pdf (deleting text) before running extraction, and this seems to corrupt the pdf in a way that crashes pymupdf4llm, because page.find_tables() returns None instead of raising, but then the None is not handled in the next line:
pymupdf_rag.py:1031-1032
tabs = page.find_tables(clip=parms.clip,
strategy=table_strategy)
for t in tabs.tables: # No None check!
The fix would be:
tabs = page.find_tables(clip=parms.clip,
strategy=table_strategy)
if tabs is not None:
for t in tabs.tables:
or maybe there should be a flag to either raise or ignore the tables?