Hello, I’ve got a request I believe should be simple, but I can’t get it to work properly. I want to take a series of USPS label pages, cut out the top half(the section with the label), then take each label/page half and combine them into a 2 in one. So if there are 8 pages of labels, I want to condense them all into 4 pages with each label stacked on top of each other. How would I go about executing this task? Thank you very much.
Hi @provscon and welcome to the forum!
So I guess if you know the exact rect of where each label will be on each page then you just need to grab that rect and create a new page for it and then put that in a document.
Something like this, (let’s call the file “extract.py”):
import pymupdf
import sys
import os
def extract_top_half_pages(input_pdf_path, output_pdf_path):“”"Extract the top half of each page from a PDF and combine them into a new document.
Args:
input_pdf_path (str): Path to the input PDF file
output_pdf_path (str): Path for the output PDF file
"""
try:
# Open the input PDF
input_doc = pymupdf.open(input_pdf_path)
# Create a new PDF document
output_doc = pymupdf.open()
print(f"Processing {len(input_doc)} pages...")
for page_num in range(len(input_doc)):
# Get the current page
page = input_doc[page_num]
# Get the page dimensions
page_rect = page.rect
page_width = page_rect.width
page_height = page_rect.height
# Define the crop rectangle for the top half ( this should be the rectangle where your label will be )
# pymupdf.Rect(x0, y0, x1, y1) where (x0,y0) is top-left, (x1,y1) is bottom-right
top_half_rect = pymupdf.Rect(0, 0, page_width, page_height / 2)
# Create a new page in the output document with the top half dimensions
new_page = output_doc.new_page(width=page_width, height=page_height / 2)
# Copy the top half content to the new page
new_page.show_pdf_page(new_page.rect, input_doc, page_num, clip=top_half_rect)
print(f"Processed page {page_num + 1}/{len(input_doc)}")
# Save the output document
output_doc.save(output_pdf_path)
# Close documents
input_doc.close()
output_doc.close()
print(f"Successfully created '{output_pdf_path}' with top halves of all pages.")
except Exception as e:
print(f"Error processing PDF: {str(e)}")
return False
return True
def main():“”"Main function to handle command line arguments and execute the extraction.“”"if len(sys.argv) != 3:print(“Usage: python script.py <input_pdf> <output_pdf>”)print(“Example: python script.py document.pdf document_top_half.pdf”)sys.exit(1)
input_pdf = sys.argv[1]
output_pdf = sys.argv[2]
# Check if input file exists
if not os.path.exists(input_pdf):
print(f"Error: Input file '{input_pdf}' does not exist.")
sys.exit(1)
# Check if input file is a PDF
if not input_pdf.lower().endswith('.pdf'):
print("Error: Input file must be a PDF.")
sys.exit(1)
# Ensure output has .pdf extension
if not output_pdf.lower().endswith('.pdf'):
output_pdf += '.pdf'
# Extract top halves
success = extract_top_half_pages(input_pdf, output_pdf)
if success:
print("Operation completed successfully!")
else:
print("Operation failed!")
sys.exit(1)
if __name__ == "__main__":
main()
Usage would be, e.g. python extract.py input.pdf output.pdf
extract.py (2.9 KB)
Just attaching the Python file as it has come out a bit strangely there in the code above!
Hm, it looks like it’s working, but it’s giving me the wrong rectangle location. I can’t upload any of the labels I have since they have people’s addresses on them, but I did find an example picture that is nearly identical to what I’m working with.
Here’s essentially what I’m attempting to do. Grab each top half of the page for a number of pages and set 2 labels on a single page. Used an image editor to do this:
double_label.pdf (130.8 KB)
Try this attached code.
extract2.py (2.9 KB)
Inspired by docs here: The Basics - PyMuPDF 1.26.3 documentation
Let me know how it works for you!
Strange, it’s like the orientation of the rect is wrong when it captures the input. It’s capturing a portion of the label and a portion of the bottom Instructions rather than only the label itself. I feel like the rect is capturing a top half rect in the Landscape orientation when I need a top half rect in the Portrait orientation. Not sure how to describe it. I’ve attached a picture of what it sort of looks like.
It will very likely be that there is some rotation applied to the pages which is causing the issue ( this can occur but not always be visible in a viewer as it accommodates for the rotation ).