Flatten the shapes

I have PDF having comments/markups like different shapes or text. Now i want to flatten/render the shapes which are available in the PDF. but preserving the text as it is no flatten. in comments panel also no need to show it.

any one could you please help me on this.

@ever4andrews Welcome! Did you try Document.bake() ? If I have understood you correctly it might provide what you need.

1 Like

Excellent it’s works for me. last few days i struggled a lot.
Thank you so much @Jamie_Lemon

1 Like

Hi @Jamie_Lemon ,

Could you please help me below requirements. i am writing the script for PDF converter for Annotation flattening and standard page size of Letter

  1. Flatten the annotations working with doc.bake
  2. Input PDF page may or maynot be in “Letter” size. but i need output as “Letter” only with preserving the text
  3. if input page was scanned we can use it as insert_image() method.

Till now i have tried multiple methods. But if i get annotations, then i am loosing the “Letter” size. or else vice versa.

Kindly help me out on this. because i don’t have much time to explore more on this.

Hi @ever4andrews .

In your case I think because your input PDF can be any page dimension but you always need to make it fit into “Letter” size (presumably portrait). Then the best thing you can do is flatten each page to an image and then scale it as best as possible into the new letter pages.

The script below should do this okay:


import pymupdf

def flatten_pdf_to_letter_size(input_pdf, output_pdf, dpi=150):
    """
    Flatten a PDF by converting each page to an image on Letter-sized pages.
    
    Args:
        input_pdf: Path to input PDF file
        output_pdf: Path to output PDF file
        dpi: Resolution for rendering (default 150)
    """
    # See: https://pymupdf.readthedocs.io/en/latest/functions.html#paper_size
    letter_size = pymupdf.paper_size("letter")
    LETTER_WIDTH = letter_size[0]
    LETTER_HEIGHT = letter_size[1]

    # Open the input PDF
    doc = pymupdf.open(input_pdf)
    
    # Create a new PDF for output
    output_doc = pymupdf.open()
    
    # Process each page
    for page_num in range(len(doc)):
        page = doc[page_num]
        
        # Render page to a pixmap (image)
        zoom = dpi / 72
        mat = pymupdf.Matrix(zoom, zoom)
        pix = page.get_pixmap(matrix=mat)
        
        # Create a new Letter-sized page
        new_page = output_doc.new_page(width=LETTER_WIDTH, height=LETTER_HEIGHT)
        
        # Calculate scaling to fit image on Letter page while maintaining aspect ratio
        page_rect = pymupdf.Rect(0, 0, LETTER_WIDTH, LETTER_HEIGHT)
        
        # Get the image dimensions in points (convert back from pixels)
        img_width = pix.width / zoom
        img_height = pix.height / zoom
        
        # Calculate scaling factor to fit within Letter size
        scale_x = LETTER_WIDTH / img_width
        scale_y = LETTER_HEIGHT / img_height
        scale = min(scale_x, scale_y)  # Use smaller scale to fit entirely
        
        # Calculate centered position
        scaled_width = img_width * scale
        scaled_height = img_height * scale
        x_offset = (LETTER_WIDTH - scaled_width) / 2
        y_offset = (LETTER_HEIGHT - scaled_height) / 2
        
        # Create rectangle for image placement
        img_rect = pymupdf.Rect(x_offset, y_offset,
                            x_offset + scaled_width, 
                            y_offset + scaled_height)
        
        # Insert the image
        new_page.insert_image(img_rect, pixmap=pix)
    
    # Save the flattened PDF
    output_doc.save(output_pdf)
    output_doc.close()
    doc.close()
    
    print(f"Flattened PDF saved to {output_pdf}")


Usage


flatten_pdf_to_letter_size("input.pdf", "output_flattened.pdf", dpi=150)

Hope this helps!

Thanks for your support @Jamie_Lemon
Yes i tried this.

But my requirement is if the page is digital format in different page size. then output page also need to be get as preserving text and fit it to “Letter”.

any how i already written for scanned pages as insert_image().

Well in this case you should extract the information and then add it all to new letter page formats. How you do this is with respect to the design of the original page might be a challenge. You need to look at Text - PyMuPDF documentation & Page - PyMuPDF documentation

Alternatively you could just try to scale content into your Letter PDF, something like:



import pymupdf

doc = pymupdf.open()  # new empty PDF
fmt = pymupdf.paper_rect("letter")
page = doc.new_page(width=fmt.width, height=fmt.height)

src = pymupdf.open("input.pdf")  # show page 0 of this

# Scale factor (2.0 = 200%, 0.5 = 50%)

scale_factor = 0.5

#Create transformation matrix

matrix = pymupdf.Matrix(scale_factor, scale_factor)

page.show_pdf_page(
page.rect * matrix,  # New size
src,
0
)

doc.save("output.pdf")