Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OtterHD: A High-Resolution Multi-modality Model

About

In this paper, we present OtterHD-8B, an innovative multimodal model evolved from Fuyu-8B, specifically engineered to interpret high-resolution visual inputs with granular precision. Unlike conventional models that are constrained by fixed-size vision encoders, OtterHD-8B boasts the ability to handle flexible input dimensions, ensuring its versatility across various inference requirements. Alongside this model, we introduce MagnifierBench, an evaluation framework designed to scrutinize models' ability to discern minute details and spatial relationships of small objects. Our comparative analysis reveals that while current leading models falter on this benchmark, OtterHD-8B, particularly when directly processing high-resolution inputs, outperforms its counterparts by a substantial margin. The findings illuminate the structural variances in visual information processing among different models and the influence that the vision encoders' pre-training resolution disparities have on model effectiveness within such benchmarks. Our study highlights the critical role of flexibility and high-resolution input capabilities in large multimodal models and also exemplifies the potential inherent in the Fuyu architecture's simplicity for handling complex visual data.

Bo Li, Peiyuan Zhang, Jingkang Yang, Yuanhan Zhang, Fanyi Pu, Ziwei Liu• 2023

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVizWiz
Accuracy44.9
1525
Object Hallucination EvaluationPOPE
Accuracy86.1
1455
Visual Question AnsweringVQA v2
Accuracy80.7
1362
Visual Question AnsweringTextVQA
Accuracy61.2
1285
Multimodal EvaluationMME
Score1.22e+3
658
Multimodal UnderstandingMMBench--
637
Multimodal UnderstandingMM-Vet--
531
Visual Question AnsweringScienceQA
Accuracy86
370
Multimodal Model EvaluationMMBench
Accuracy58.3
180
Hallucination EvaluationPOPE
Accuracy86
153
Showing 10 of 16 rows

Other info

Code

Follow for update