MangaLMM Official Demo
We propose MangaVQA and MangaLMM, which are a benchmark and a specialized LMM for multimodal manga understanding.
This demo uses our MangaLMM model to perform OCR on an image of manga panels and answer a question about the image.
Please ensure that the image contains fewer than 2116800 pixels. (e.g. 1600x1200, 1920x1080, etc.) If more, we resize it to smaller size.
Note: This model is for research purposes only and may return incorrect results. Please use it at your own risk.