Extracted page text includes annotations page_text = page.get_text("text")

bik123 · March 8, 2026, 5:56am

Extracted page text includes annotations (type FreeText)

When extracting text using:

page_text = page.get_text("text")

The text from annotations of type FreeText is included in the extracted page text.

Example workflow:

import pymupdf

with pymupdf.open(pdf_path) as doc:
    for page in doc:
        page_text = page.get_text("text")

The extracted page_text contains text that originates from FreeText annotations, even though this text is not part of the page content stream (/Contents).

Inspecting the raw page content confirms the annotation text is not present there:

xref_list = page.get_contents()
for xref in xref_list:
    stream = doc.xref_stream(xref)
    print(stream[:500])

The text appears to come from the annotation appearance stream (/Annots -> /AP), which get_text() seems to include.

Question

How to extract page_text which includes only PDF page text, without FreeText annotations?

Jamie_Lemon · March 9, 2026, 2:10pm

Hi @bik123 Welcome to the forum and thanks for your post.

I think the trick is to delete the free text annotations whilst you iterate through the pages and check for the annotation types that you want to remove.

This worked for me:

for page in src:
    xrefs = [annot.xref for annot in page.annots(types=[pymupdf.PDF_ANNOT_FREE_TEXT])]
    for xref in xrefs:
        a = page.load_annot(xref)
        page.delete_annot(a)

    text = page.get_text()
    print(text)

Topic		Replies	Views
Can not remove a box in the footer PyMuPDF	6	78	October 16, 2025
Spaces missing after extracting text with Page.get_text() PyMuPDF text	7	65	February 25, 2026
Removing watermark text PyMuPDF	2	80	January 27, 2026
Any idea what is wrong with this PDF? PyMuPDF	6	200	July 9, 2025
Pymupdf unexpected result for Chinese text in freetext annotation(page.add_freetext_annot) PyMuPDF	4	52	September 24, 2025

Extracted page text includes annotations page_text = page.get_text("text")

Extracted page text includes annotations (type FreeText)

Question

Related topics