Images within a table not extracted

Hi,

I am trying to create a markdown from PDF and issue happens to images that are embedded within a table.

PDF I am trying to extract: https://cars.tatamotors.com/content/dam/tml/pv/general/service/owners-manual/pdf/harrier/harrier-bs6-owners-manual-april-2026.pdf

Refer to Pages: 64, 65

The images in Pictogram column is not being extracted

I went through the forum and I tried with different options by setting image_size_limit=0, ignore_graphics=False, but still none of them is working.

import pymupdf4llm

FILE = "harrier-bs6-owners-manual-april-2026.pdf"

md_text = pymupdf4llm.to_markdown(FILE, pages=63, header=False, footer=False, embed_images=True, image_size_limit=0, ignore_graphics=False)

output = open("out-markdown.md", "w")

output.write(md_text)

output.close()

Welcome to the Forum @Viswa !

Images, hyperlinks and vector graphics inside table cells are currently out of scope - sorry.

Thanks for the quick reply. Is there any plan for this feature to included in later releases which I could look for?

Also, is there a way for identifying there is an image but not extracted from the table ?

We do intend to support this, but there exists no schedule yet: our list of planned enhancements is loooong :smiling_face_with_sunglasses:.

But you can easily determine whether there exist image(s) in side any region on the page, e.g. also inside a (table or cell or whatever) bbox:

images = page.get_image_info()  # list of images on page (metadata only)
images_in_bbox = [img for img in images if img["bbox"] in pymupdf.Rect(bbox)]