Pymupdf4llm unexpected reordering of output after v0.0.17

Hello and thank you for the project! I am using pymupdf4llm for converting PDFs to Markdown. The quality of conversion seemed best in v0.0.17 for the documents I work with. I’ve been tracking the issues others have opened since then (for example, GitHub issues #261 and #289) and none of the advice and subsequent releases have improved the issues I’ve seen. I also saw Better line structure in earlier versions, what happened? and figured I could help by providing a specific example of the biggest blocker I have in upgrading.

With this PDF (all of the info inside is fake) as input: input.pdf (81.5 KB)

After upgrading to v0.0.18 there are paragraphs that moved from the middle / bottom of the page to the top of the page.

The conversion script can follow the examples in the documentation:
```python
import pathlib
import pymupdf4llm

md_text = pymupdf4llm.to_markdown(“input.pdf”)
pathlib.Path(“output.md”).write_bytes(md_text.encode())
```

I originally planned to include output files for v0.0.17, v0.0.18, and the latest v0.0.27, but I am limited to 2 links.

Here is a screenshot of the git diff changing between v0.0.17 and v0.0.18, with the undesired relocated paragraphs:

Hope this helps!

@grantbdev_mut Thanks for this report and welcome to the forum! We will need to take a closer look at the diffs between 0.0.17 & 0.0.18 I think to find out what has changed.