MLLM-as-a-Judge

Benchmarks

Task Name	Dataset Name	SOTA Result
VLM-as-a-Judge	MLLM-as-a-Judge	Accuracy75.78	32
Multimodal Evaluation Consistency	MLLM-as-a-Judge	CO Score39.6	22
Large Multimodal Model Evaluation	MLLM-as-a-Judge v1.0 (test)	Overall Score49	16
Reward Modeling	MLLM-as-a-Judge (MaaJ)	Accuracy72.18	13
Human Consistency Evaluation	MLLM-as-a-Judge	CO Consistency Score30.3	11
Pointwise Scoring	MLLM-as-a-Judge in-domain v1.0 (test)	ImageDC Score80.2	9

Showing 6 of 6 rows