I have PDF having comments/markups like different shapes or text. Now i want to flatten/render the shapes which are available in the PDF. but preserving the text as it is no flatten. in comments panel also no need to show it.
any one could you please help me on this.
@ever4andrews Welcome! Did you try Document.bake() ? If I have understood you correctly it might provide what you need.
1 Like
Excellent it’s works for me. last few days i struggled a lot.
Thank you so much @Jamie_Lemon
1 Like
Hi @Jamie_Lemon ,
Could you please help me below requirements. i am writing the script for PDF converter for Annotation flattening and standard page size of Letter
- Flatten the annotations working with doc.bake
- Input PDF page may or maynot be in “Letter” size. but i need output as “Letter” only with preserving the text
- if input page was scanned we can use it as insert_image() method.
Till now i have tried multiple methods. But if i get annotations, then i am loosing the “Letter” size. or else vice versa.
Kindly help me out on this. because i don’t have much time to explore more on this.
Hi @ever4andrews .
In your case I think because your input PDF can be any page dimension but you always need to make it fit into “Letter” size (presumably portrait). Then the best thing you can do is flatten each page to an image and then scale it as best as possible into the new letter pages.
The script below should do this okay:
import pymupdf
def flatten_pdf_to_letter_size(input_pdf, output_pdf, dpi=150):
"""
Flatten a PDF by converting each page to an image on Letter-sized pages.
Args:
input_pdf: Path to input PDF file
output_pdf: Path to output PDF file
dpi: Resolution for rendering (default 150)
"""
# See: https://pymupdf.readthedocs.io/en/latest/functions.html#paper_size
letter_size = pymupdf.paper_size("letter")
LETTER_WIDTH = letter_size[0]
LETTER_HEIGHT = letter_size[1]
# Open the input PDF
doc = pymupdf.open(input_pdf)
# Create a new PDF for output
output_doc = pymupdf.open()
# Process each page
for page_num in range(len(doc)):
page = doc[page_num]
# Render page to a pixmap (image)
zoom = dpi / 72
mat = pymupdf.Matrix(zoom, zoom)
pix = page.get_pixmap(matrix=mat)
# Create a new Letter-sized page
new_page = output_doc.new_page(width=LETTER_WIDTH, height=LETTER_HEIGHT)
# Calculate scaling to fit image on Letter page while maintaining aspect ratio
page_rect = pymupdf.Rect(0, 0, LETTER_WIDTH, LETTER_HEIGHT)
# Get the image dimensions in points (convert back from pixels)
img_width = pix.width / zoom
img_height = pix.height / zoom
# Calculate scaling factor to fit within Letter size
scale_x = LETTER_WIDTH / img_width
scale_y = LETTER_HEIGHT / img_height
scale = min(scale_x, scale_y) # Use smaller scale to fit entirely
# Calculate centered position
scaled_width = img_width * scale
scaled_height = img_height * scale
x_offset = (LETTER_WIDTH - scaled_width) / 2
y_offset = (LETTER_HEIGHT - scaled_height) / 2
# Create rectangle for image placement
img_rect = pymupdf.Rect(x_offset, y_offset,
x_offset + scaled_width,
y_offset + scaled_height)
# Insert the image
new_page.insert_image(img_rect, pixmap=pix)
# Save the flattened PDF
output_doc.save(output_pdf)
output_doc.close()
doc.close()
print(f"Flattened PDF saved to {output_pdf}")
Usage
flatten_pdf_to_letter_size("input.pdf", "output_flattened.pdf", dpi=150)
Hope this helps!
Thanks for your support @Jamie_Lemon
Yes i tried this.
But my requirement is if the page is digital format in different page size. then output page also need to be get as preserving text and fit it to “Letter”.
any how i already written for scanned pages as insert_image().
Well in this case you should extract the information and then add it all to new letter page formats. How you do this is with respect to the design of the original page might be a challenge. You need to look at Text - PyMuPDF documentation & Page - PyMuPDF documentation
Alternatively you could just try to scale content into your Letter PDF, something like:
import pymupdf
doc = pymupdf.open() # new empty PDF
fmt = pymupdf.paper_rect("letter")
page = doc.new_page(width=fmt.width, height=fmt.height)
src = pymupdf.open("input.pdf") # show page 0 of this
# Scale factor (2.0 = 200%, 0.5 = 50%)
scale_factor = 0.5
#Create transformation matrix
matrix = pymupdf.Matrix(scale_factor, scale_factor)
page.show_pdf_page(
page.rect * matrix, # New size
src,
0
)
doc.save("output.pdf")