PDF text extraction from a complex form

is there any way to get structured json output from complex form - multiple layouts(not xfa based) from pymupdf
ITR3_Notified Form AY 2023-24.pdf (2.2 MB)

This is indeed a complex form!

To get a JSON representation use:

import pymupdf

doc = pymupdf.open("form.pdf")

# Select a specific page (e.g., the first page)
page = doc[0]

# Get representation as JSON
json = page.get_text("json")

print(f"json: ({json})")

# Close the document
doc.close()