Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Human Consistency Evaluation on MLLM-as-a-Judge

30.3CO Consistency Score

LLaVA-Critic

10.43615.59320.7525.907Jun 3, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.06
30.333.82.934.838.137.413.543.134.820.47.625.61729.3
2025.06
29.934.13648.142.147.428.651.240.518.9-0.234.518.823.2
2025.06
26.930.317.637.737.835.429.548.729.221.47.934.820.928.3
2025.06
24.935.723.148.541.444.536.650.93525.21238.723.825.8
2025.06
23.731.913.829.534.23520.235.73018.312.531.42714.9
2025.06
23.128.816.136.835.939.630.347.231.918.31.728.820.429
2025.06
20.213.37.514.224.122.24.619.727.221.91913.819.626.9
2025.06
19.231.610.330.123.329.18.32525.415.614.87.728.625.3
2025.06
17.331.223.649.140.94137.949.336.422-0.731.722.922.6
2025.06
15.316.910.412.911.519.83.328.219.615.917.56.626.926
2025.06
11.29.19.412.711.117.8188.411.515.511.48.224.111