Set_bleedbox issue

Hi,

I have a situation where I need to take an existing pdf and add a blank page before page 1. However, it must match the trim, bleed and media boxes of the original document.

The issue happens on some pdfs which I assume are malformed. On these PDFs I’m unable to add a bleed box, getting the error: BleedBox not in MediaBox.

I’ve tried:

  • Confirming sizes and coordinates match.
  • Changing the order the page boxes are set.
  • Forcing the bleed box to be the same size as the media box.
  • Making the bleed box slightly smaller (0.01 or 0.1 smaller) than media box.
  • Adding a clone: “bleed_box = fitz.Rect(media_box)”

But no matter what I try, the script fails with the same “BleedBox not in MediaBox” error when setting the bleed box.

# Validated & rounded boxes
boxes = {
        "media": media_box,
        "bleed": bleed_box,
        "trim": trim_box,
    }

    # Create the new info sheet page
    info_page = output_doc.new_page(width=boxes["media"].width, height=boxes["media"].height)
    
    # Set the validated boxes on the new page
    info_page.set_mediabox(boxes["media"])
    info_page.set_bleedbox(boxes["bleed"])

Any help will be greatly appreciated.

Hi @AdrianC ! I think your problem might be this line:
info_page = output_doc.new_page(width=boxes["media"].width, height=boxes["media"].height)

You should be declaring the-age index to insert the page at, instead you send through the width of the document, so I think you mean:

info_page = doc.new_page(0, width=page.bound().width, height=page.bound().height)

This will insert the page at the first index as you might expect, right now you are inserting it at index = page width so it will go to the last page and your 2nd parameter height is defining width! Document - PyMuPDF 1.26.3 documentation
So no wonder the page has confused metrics!

Oh wait - my bad - you do explicitly define width and height - my mind was in a weird place!!! Sorry - let me think about this …

@AdrianC Might be something wrong with your PDF - can you supply it? Or if you can’t then what happens if you do:

print(boxes[“media”])
print(boxes[“bleed”])

Would love to know what those box rectangles are!

@AdrianC - please let us have the print out of one of the pages having the desired bloody boxes. I’ll make a little script to set these values in the new page.

page = doc[pno]   # some existing page with those boxes
print(doc.xref_object(page.xref)  # we need this output

Book.pdf (1.3 MB)
I’ve attached a sample pdf that is failing.

The box print out of the source PDF (attached) is:

(From inside the “add_info_sheet” function)
Media Box: Rect(-8.63, -8.63, 637.67, 637.67)
Bleed Box: Rect(-8.63, 0.0, 637.67, 646.3)
Trim Box: Rect(8.38, 17.01, 620.66, 629.29)

Print from def main()

<<
  /BleedBox [ -8.629632 -8.629632 637.6696 637.6696 ]
  /Contents 12 0 R
  /CropBox [ -8.629632 -8.629632 637.6696 637.6696 ]
  /MediaBox [ -8.629632 -8.629632 637.6696 637.6696 ]
  /Parent 98 0 R
  /QITE_pageid <<
    /D (D:20250819101106)
    /F 121 0 R
    /I 62 0 R
    /P 2
    /UF 122 0 R
  >>
  /Resources <<
    /Font <<
      /F1 106 0 R
      /F2 106 0 R
      /F3 72 0 R
    >>
    /ProcSet [ /PDF /ImageC /Text /ImageB /ImageI ]
    /Properties <<
      /Prop1 84 0 R
    >>
    /XObject <<
      /image 86 0 R
    >>
  >>
  /TrimBox [ 8.378268 8.378268 620.66177 620.66177 ]
  /Type /Page
>>

It looks like I’m getting a discrepancy on the values from the source after I send it to the function?

def add_info_sheet(output_doc: fitz.Document, src_page: fitz.Page, item_id: str, item_index: int, total_items: int, sides: int, batch_no: str):
   # --- Get the boxes from the source page ---
   media_box = src_page.mediabox
   bleed_box = src_page.bleedbox
   trim_box  = src_page.trimbox
   print(f"Media Box: {media_box}\nBleed Box: {bleed_box}\nTrim Box: {trim_box}")
   ...

def main():
   src = fitz.open(item_pdf_path)
   print(src.xref_object(src[0].xref))

   # 1) Add info sheet (and blank if sides==2)
   add_info_sheet(out, src[0], item_id, idx, total_items, sides, batch_no)

@AdrianC - Sorry for not replying already yesterday - had an urgent private matter to take care of.

When using the standard interface, you cannot avoid that Python’s handling of floats and their rounding might come in the way.
As you seem to be dependent on awkward PDF *Box values with equally awkward decimal precisions, there is no way but setting these boxes using PyMuPDF’s low-level interface.
This also avoids any of the plausibility checks that are biting you. Here is a script that hopefully does what you want:

import pymupdf

doc = pymupdf.open()
page = doc.new_page()

# show the default page creation result
print(doc.xref_object(page.xref))

# set values of certain page boxes via low-level API
doc.xref_set_key(page.xref, "MediaBox", "[ -8.629632 -8.629632 637.6696 637.6696 ]")
doc.xref_set_key(page.xref, "CropBox", "[ -8.629632 -8.629632 637.6696 637.6696 ]")
doc.xref_set_key(page.xref, "BleedBox", "[ -8.629632 -8.629632 637.6696 637.6696 ]")
doc.xref_set_key(page.xref, "TrimBox", "[ 8.378268 8.378268 620.66177 620.66177 ]")

# confirm result:
print(doc.xref_object(page.xref))

@HaraldLieder - It’s no problem at all.

Unfortunately, I can’t hardcode the sizes because the files going through the script will vary in size. But thanks to your help, I adjusted my script following your suggestion and the bellow works fine:

src_page = src[0]
    media_box = src.xref_get_key(src_page.xref, "MediaBox")
    crop_box = src.xref_get_key(src_page.xref, "CropBox")
    bleed_box = src.xref_get_key(src_page.xref, "BleedBox")
    trim_box  = src.xref_get_key(src_page.xref, "TrimBox")
    art_box   = src.xref_get_key(src_page.xref, "ArtBox")

info_page = output_doc.new_page()

    # helper function to format box values as PDF array strings
    def box_to_str(box):
        return "[ " + " ".join(str(v) for v in box) + " ]"

    output_doc.xref_set_key(info_page.xref, "MediaBox", box_to_str(media_box))
    output_doc.xref_set_key(info_page.xref, "CropBox",  box_to_str(crop_box))
    output_doc.xref_set_key(info_page.xref, "BleedBox", box_to_str(bleed_box))
    output_doc.xref_set_key(info_page.xref, "TrimBox",  box_to_str(trim_box))
    output_doc.xref_set_key(info_page.xref, "ArtBox",   box_to_str(art_box))

For reference (in case others have a similar issue and find this post)…
Changing the media_box variable assignment to “src_page.rect” instead of “.mediabox” seems to work as well. But so far I’ve only tested with the file that failed before.

The below might be an easier approach since I need to use the sizes in ReportLab. Otherwise I would need to convert the values.

def add_info_sheet(output_doc: fitz.Document, src_page: fitz.Page,
                   item_id: str, item_index: int, total_items: int, sides: int, batch_no: str):

    # Get the boxes from the source page. The `rect` property is a safe bet.
    media_box = src_page.rect
    trim_box = src_page.trimbox
    bleed_box = src_page.bleedbox

Thanks again for the help.

Cool - I had hoped that you would get the clou how to copy *Box definitions over from existing pages. Congratulations!