How to convert json to pandas dataframe

mir975 · February 20, 2026, 7:01am

How to convert the pymupdf4llm.to_json format into a pandas DataFrame while preserving the exact row and column positioning of the contents?

HaraldLieder · February 20, 2026, 4:44pm

Hi @mir975

Use the pymupdf4llm.to_json() method.
The resulting dict / json has the "pages" key which is a list of one dict per page.
Each page dict contains the key "boxes" which is a list of the layout boxes identified on the page.
Each layout box has a "boxclass" key. Its value is "table" for tables.
In that case there also exists the "table" key with all table-relevant data. For example, table["extract"] is a list of list of the table’s cell values. This can be list can be passed to pandas.

For example:

import sys
from pathlib import Path
import pymupdf.layout
import pymupdf4llm
import json
import pandas


doc = pymupdf.open(sys.argv[1])
out = pymupdf4llm.to_json(doc)
outdict = json.loads(out)
page0 = outdict["pages"][0]  # dictionary for page 0
tabboxes = [b for b in page0["boxes"] if b["boxclass"] == "table"]
tab0 = tabboxes[0]["table"]  # first table of page 0
extract = tab0["extract"]  # list of lists of cell text content

df = pandas.DataFrame(extract[1:], columns=extract[0])  # create DataFrame
print(df)

Gives you this:

  Boiling Points °C   min    max     avg
0       Noble gases  -269    -62  -170.5
1         Nonmetals  -253   4827   414.1
2        Metalloids   335   3900   741.5
3            Metals   357  >5000  2755.9

for this table

Topic		Replies	Views
Convert a JSON file to a PDF How To	0	40	July 22, 2025
How to convert markdown to pandas dataframe PyMuPDF	1	17	February 18, 2026
PDF text extraction from a complex form PyMuPDF	1	60	July 18, 2025
Pymupdf layout table detection issue PyMuPDF	14	135	February 24, 2026
Bug: pymupdf4llm: mis-interpreted layout and IndexError on specific pages (insurance policy PDF) PyMuPDF	5	48	January 6, 2026

How to convert json to pandas dataframe

Related topics