Hi,
I have a situation where I need to take an existing pdf and add a blank page before page 1. However, it must match the trim, bleed and media boxes of the original document.
The issue happens on some pdfs which I assume are malformed. On these PDFs I’m unable to add a bleed box, getting the error: BleedBox not in MediaBox.
I’ve tried:
- Confirming sizes and coordinates match.
- Changing the order the page boxes are set.
- Forcing the bleed box to be the same size as the media box.
- Making the bleed box slightly smaller (0.01 or 0.1 smaller) than media box.
- Adding a clone: “bleed_box = fitz.Rect(media_box)”
But no matter what I try, the script fails with the same “BleedBox not in MediaBox” error when setting the bleed box.
# Validated & rounded boxes
boxes = {
"media": media_box,
"bleed": bleed_box,
"trim": trim_box,
}
# Create the new info sheet page
info_page = output_doc.new_page(width=boxes["media"].width, height=boxes["media"].height)
# Set the validated boxes on the new page
info_page.set_mediabox(boxes["media"])
info_page.set_bleedbox(boxes["bleed"])
Any help will be greatly appreciated.
Hi @AdrianC ! I think your problem might be this line:
info_page = output_doc.new_page(width=boxes["media"].width, height=boxes["media"].height)
You should be declaring the-age index to insert the page at, instead you send through the width of the document, so I think you mean:
info_page = doc.new_page(0, width=page.bound().width, height=page.bound().height)
This will insert the page at the first index as you might expect, right now you are inserting it at index = page width so it will go to the last page and your 2nd parameter height is defining width! Document - PyMuPDF 1.26.3 documentation
So no wonder the page has confused metrics!
Oh wait - my bad - you do explicitly define width and height - my mind was in a weird place!!! Sorry - let me think about this …
@AdrianC Might be something wrong with your PDF - can you supply it? Or if you can’t then what happens if you do:
print(boxes[“media”])
print(boxes[“bleed”])
Would love to know what those box rectangles are!
@AdrianC - please let us have the print out of one of the pages having the desired bloody boxes. I’ll make a little script to set these values in the new page.
page = doc[pno] # some existing page with those boxes
print(doc.xref_object(page.xref) # we need this output
Book.pdf (1.3 MB)
I’ve attached a sample pdf that is failing.
The box print out of the source PDF (attached) is:
(From inside the “add_info_sheet” function)
Media Box: Rect(-8.63, -8.63, 637.67, 637.67)
Bleed Box: Rect(-8.63, 0.0, 637.67, 646.3)
Trim Box: Rect(8.38, 17.01, 620.66, 629.29)
Print from def main()
<<
/BleedBox [ -8.629632 -8.629632 637.6696 637.6696 ]
/Contents 12 0 R
/CropBox [ -8.629632 -8.629632 637.6696 637.6696 ]
/MediaBox [ -8.629632 -8.629632 637.6696 637.6696 ]
/Parent 98 0 R
/QITE_pageid <<
/D (D:20250819101106)
/F 121 0 R
/I 62 0 R
/P 2
/UF 122 0 R
>>
/Resources <<
/Font <<
/F1 106 0 R
/F2 106 0 R
/F3 72 0 R
>>
/ProcSet [ /PDF /ImageC /Text /ImageB /ImageI ]
/Properties <<
/Prop1 84 0 R
>>
/XObject <<
/image 86 0 R
>>
>>
/TrimBox [ 8.378268 8.378268 620.66177 620.66177 ]
/Type /Page
>>
It looks like I’m getting a discrepancy on the values from the source after I send it to the function?
def add_info_sheet(output_doc: fitz.Document, src_page: fitz.Page, item_id: str, item_index: int, total_items: int, sides: int, batch_no: str):
# --- Get the boxes from the source page ---
media_box = src_page.mediabox
bleed_box = src_page.bleedbox
trim_box = src_page.trimbox
print(f"Media Box: {media_box}\nBleed Box: {bleed_box}\nTrim Box: {trim_box}")
...
def main():
src = fitz.open(item_pdf_path)
print(src.xref_object(src[0].xref))
# 1) Add info sheet (and blank if sides==2)
add_info_sheet(out, src[0], item_id, idx, total_items, sides, batch_no)
@AdrianC - Sorry for not replying already yesterday - had an urgent private matter to take care of.
When using the standard interface, you cannot avoid that Python’s handling of floats and their rounding might come in the way.
As you seem to be dependent on awkward PDF *Box values with equally awkward decimal precisions, there is no way but setting these boxes using PyMuPDF’s low-level interface.
This also avoids any of the plausibility checks that are biting you. Here is a script that hopefully does what you want:
import pymupdf
doc = pymupdf.open()
page = doc.new_page()
# show the default page creation result
print(doc.xref_object(page.xref))
# set values of certain page boxes via low-level API
doc.xref_set_key(page.xref, "MediaBox", "[ -8.629632 -8.629632 637.6696 637.6696 ]")
doc.xref_set_key(page.xref, "CropBox", "[ -8.629632 -8.629632 637.6696 637.6696 ]")
doc.xref_set_key(page.xref, "BleedBox", "[ -8.629632 -8.629632 637.6696 637.6696 ]")
doc.xref_set_key(page.xref, "TrimBox", "[ 8.378268 8.378268 620.66177 620.66177 ]")
# confirm result:
print(doc.xref_object(page.xref))
@HaraldLieder - It’s no problem at all.
Unfortunately, I can’t hardcode the sizes because the files going through the script will vary in size. But thanks to your help, I adjusted my script following your suggestion and the bellow works fine:
src_page = src[0]
media_box = src.xref_get_key(src_page.xref, "MediaBox")
crop_box = src.xref_get_key(src_page.xref, "CropBox")
bleed_box = src.xref_get_key(src_page.xref, "BleedBox")
trim_box = src.xref_get_key(src_page.xref, "TrimBox")
art_box = src.xref_get_key(src_page.xref, "ArtBox")
info_page = output_doc.new_page()
# helper function to format box values as PDF array strings
def box_to_str(box):
return "[ " + " ".join(str(v) for v in box) + " ]"
output_doc.xref_set_key(info_page.xref, "MediaBox", box_to_str(media_box))
output_doc.xref_set_key(info_page.xref, "CropBox", box_to_str(crop_box))
output_doc.xref_set_key(info_page.xref, "BleedBox", box_to_str(bleed_box))
output_doc.xref_set_key(info_page.xref, "TrimBox", box_to_str(trim_box))
output_doc.xref_set_key(info_page.xref, "ArtBox", box_to_str(art_box))
For reference (in case others have a similar issue and find this post)…
Changing the media_box variable assignment to “src_page.rect” instead of “.mediabox” seems to work as well. But so far I’ve only tested with the file that failed before.
The below might be an easier approach since I need to use the sizes in ReportLab. Otherwise I would need to convert the values.
def add_info_sheet(output_doc: fitz.Document, src_page: fitz.Page,
item_id: str, item_index: int, total_items: int, sides: int, batch_no: str):
# Get the boxes from the source page. The `rect` property is a safe bet.
media_box = src_page.rect
trim_box = src_page.trimbox
bleed_box = src_page.bleedbox
Thanks again for the help.
Cool - I had hoped that you would get the clou how to copy *Box definitions over from existing pages. Congratulations!