is there any way to get structured json output from complex form - multiple layouts(not xfa based) from pymupdf
ITR3_Notified Form AY 2023-24.pdf (2.2 MB)
This is indeed a complex form!
To get a JSON representation use:
import pymupdf
doc = pymupdf.open("form.pdf")
# Select a specific page (e.g., the first page)
page = doc[0]
# Get representation as JSON
json = page.get_text("json")
print(f"json: ({json})")
# Close the document
doc.close()