Issue: Hyperlink extraction from pdf to markdown is not working

r_s1 · January 8, 2026, 2:23pm

I have the following code to convert PDF files into Markdown format, but the hyperlinks present in the PDF are not being preserved or converted correctly into Markdown.

doc = pymupdf.open(file_path)
md_text = pymupdf4llm.to_markdown(
    doc,
    header=False,
    footer=False,
    embed_images=True,
    dpi=300,
    use_ocr=True
)

Jamie_Lemon · January 8, 2026, 4:59pm

This feature is not yet supported in PyMuPDF4LLM. Links are a little bit tricky to deal with as they can be internal (linking to other areas of the doc) or external (a website). Many links that you get from Page - PyMuPDF documentation will be invisible rects overlayed on areas of a doc too. For obvious text which is a website kink, you could use maybe post-process the resulting MD and look for any obvious inline text websites. e.g. if the text body contains https:// then figure out how to wrap that with correct markdown.

Topic		Replies	Views
Underlines not handled by pymupdf4llm.to_markdown PyMuPDF	7	99	August 13, 2025
To_markdown only producing header tags (and no text), to_json produces correct text from spans PyMuPDF	12	53	May 6, 2026
Some drawings missing from pymupdf4llm output PyMuPDF	3	55	March 2, 2026
Img link placed before text instead of after PyMuPDF	1	24	January 30, 2026
Better line structure in earlier versions, what happened? PyMuPDF	3	70	September 12, 2025

Issue: Hyperlink extraction from pdf to markdown is not working

Related topics