MangaLMM Official Demo

We propose MangaVQA and MangaLMM, which are a benchmark and a specialized LMM for multimodal manga understanding.

This demo uses our MangaLMM model to perform OCR on an image of manga panels and answer a question about the image.

Please ensure that the image contains fewer than 2116800 pixels. (e.g. 1600x1200, 1920x1080, etc.) If more, we resize it to smaller size.

Note: This model is for research purposes only and may return incorrect results. Please use it at your own risk.

Examples