Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MLLM-as-a-Judge

Benchmarks

Task NameDataset NameSOTA ResultTrend
Large Multimodal Model EvaluationMLLM-as-a-Judge v1.0 (test)
Overall Score49
16
Reward ModelingMLLM-as-a-Judge (MaaJ)
Accuracy72.18
13
Pointwise ScoringMLLM-as-a-Judge in-domain v1.0 (test)
ImageDC Score80.2
9
Showing 3 of 3 rows