Is it possible to use pymupdf4llm.to_markdown with FastAPI UploadFile?

marsea · August 28, 2025, 2:07pm

Hi,

I’m using FastAPI to upload PDFs, and I want to parse them with pymupdf4llm.to_markdown to maintain their structure. This worked fine with file paths during local development and testing, but UploadFile gives me bytes (await file.read()), and it seems to_markdown doesn’t accept in-memory files or BytesIO.

Is there a way to use to_markdown without saving the file to disk? I’d like to keep everything in-memory if possible.

Thanks!

Jamie_Lemon · August 28, 2025, 5:04pm

HI @marsea and welcome to the forum!

I think you can load the document data from a stream and then direct that to pymupdf4llm with the following kind of approach:

import pymupdf
import pymupdf4llm
import requests

r = requests.get(‘https://mupdf.com/docs/mupdf_explored.pdf’) 
data = r.content 
doc = pymupdf.Document(stream=data)

md_text = pymupdf4llm.to_markdown(doc, show_progress = True)

print(md_text)

Hopefully I’ve understood what you are looking for here okay - please let me know how that goes for you!

Topic		Replies	Views
BUG: pymupdf4llm list index out of range in document_layout.py PyMuPDF	9	50	December 2, 2025
Pymupdf4llm CLI PyMuPDF	4	55	July 15, 2025
To_markdown only producing header tags (and no text), to_json produces correct text from spans PyMuPDF	9	16	April 15, 2026
Pymupdf4llm parsing takes excessively long time PyMuPDF	2	64	December 4, 2025
Issue: Hyperlink extraction from pdf to markdown is not working PyMuPDF text	1	20	January 8, 2026

Is it possible to use pymupdf4llm.to_markdown with FastAPI UploadFile?

Related topics