Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VLM-as-a-Judge on MLLM-as-a-Judge

75.78Accuracy

GPT-4o

52.3858.45564.5370.605Apr 20, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
75.78
2026.04
75.12
2026.04
73.37
2026.04
73.37
2026.04
71.17
2026.04
69.21
2026.04
68.34
2026.04
68.04
2026.04
67.63
2026.04
66.92
2026.04
66.9
2026.04
66.73
2026.04
66.45
2026.04
66.21
2026.04
65.79
2026.04
65.67
2026.04
65.35
2026.04
65.28
2026.04
65.14
2026.04
63.18
2026.04
63.11
2026.04
61.54
2026.04
59.48
2026.04
59.19
2026.04
59.11
2026.04
59.07
2026.04
58.95
2026.04
58.88
2026.04
58.02
2026.04
56.21
2026.04
54.69
2026.04
53.28