Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

About

We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models and publish reproducible evaluation results. In VLMEvalKit, we implement over 200+ different large multi-modality models, including both proprietary APIs and open-source models, as well as more than 80 different multi-modal benchmarks. By implementing a single interface, new models can be easily added to the toolkit, while the toolkit automatically handles the remaining workloads, including data preparation, distributed inference, prediction post-processing, and metric calculation. Although the toolkit is currently mainly used for evaluating large vision-language models, its design is compatible with future updates that incorporate additional modalities, such as audio and video. Based on the evaluation results obtained with the toolkit, we host OpenVLM Leaderboard, a comprehensive leaderboard to track the progress of multi-modality learning research. The toolkit is released on https://github.com/open-compass/VLMEvalKit and is actively maintained.

Haodong Duan, Xinyu Fang, Junming Yang, Xiangyu Zhao, Yuxuan Qiao, Mo Li, Amit Agarwal, Zhe Chen, Lin Chen, Yuan Liu, Yubo Ma, Hailong Sun, Yifan Zhang, Shiyin Lu, Tack Hwa Wong, Weiyun Wang, Peiheng Zhou, Xiaozhe Li, Chaoyou Fu, Junbo Cui, Jixuan Chen, Enxin Song, Song Mao, Shengyuan Ding, Tianhao Liang, Zicheng Zhang, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Jiaqi Wang, Dahua Lin, Kai Chen• 2024

Related benchmarks

TaskDatasetResultRank
Hallucination and Visual Reasoning EvaluationHallusionBench
Accuracy (aACC)68.5
40
Object Hallucination DetectionPOPE
Accuracy89.1
11
Multimodal Capability EvaluationMMStar
CP Score72
11
General Multimodal PerformancePOPE, HallusionBench, MMStar Average
Overall Score66.5
11
Open-ended Question AnsweringOKVQA
LVM Evaluation Score70.8
6
Showing 5 of 5 rows

Other info

Follow for update