Is it possible to use pymupdf4llm.to_markdown with FastAPI UploadFile?

Hi,

I’m using FastAPI to upload PDFs, and I want to parse them with pymupdf4llm.to_markdown to maintain their structure. This worked fine with file paths during local development and testing, but UploadFile gives me bytes (await file.read()), and it seems to_markdown doesn’t accept in-memory files or BytesIO.

Is there a way to use to_markdown without saving the file to disk? I’d like to keep everything in-memory if possible.

Thanks!

HI @marsea and welcome to the forum!

I think you can load the document data from a stream and then direct that to pymupdf4llm with the following kind of approach:

import pymupdf
import pymupdf4llm
import requests

r = requests.get(‘https://mupdf.com/docs/mupdf_explored.pdf’) 
data = r.content 
doc = pymupdf.Document(stream=data)

md_text = pymupdf4llm.to_markdown(doc, show_progress = True)

print(md_text)


Hopefully I’ve understood what you are looking for here okay - please let me know how that goes for you!