In BDC dictionary I store page and line number. How can easy get position in the page of marked content I know page and marked content name (for example page 1 /Line1 )? Like now there is doc.resolve_names() which sounds like should return some objects names but is dedicated only for destinations (why not doc.get_destinations() then)
import fitz
doc = fitz.open("qbody.pdf")
for page in doc:
for cont in page.get_contents():
print (doc.xref_stream(cont))
names = doc.resolve_names()
print (names)
No there is no such access.
You must access the /Contents of the page and hack your way through it.
And you would have to decipher the /Properties object of the page to look up objects like R13 = << /Page (1) /Pos /Left >> etc.
You could use PyMuPDF’s submodule mupdf for easy access of the page object’s dictionaries. Also quite hacky … but possible if you know what you are doing:
mupdf = pymupdf.mupdf # sub-module
pdfpage = pymupdf._as_pdf_page(page) # underlying PDF page
# step through the resources to access /Properties
resources = mupdf.pdf_dict_get(pdfpage.obj(), pymupdf.PDF_NAME("Resources"))
# now the Properties object:
props = resources.pdf_dict_get(pymupdf.PDF_NAME("Properties"))
# iterate the properties sub dicts:
for i in range(props.pdf_dict_len()):
k = props.pdf_dict_get_key(i)
v = props.pdf_dict_get_val(i)
print(k.pdf_to_name(), doc.xref_object(v.pdf_to_num(), compressed=True))
R17 <</Page(1)/Pos/Left>>
R19 <</Page(1)/Pos/Left>>
R13 <</Page(1)/Pos/Left>>
Of course you could continue and cleanly extract the /Line and /Pos values … as opposed to retrieving the object’s string as I did.