Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Human Consistency Evaluation on GenAI-Bench

38.4Kendall's Tau-c

VQAScore

0.0249.98719.9529.913May 20, 2026
Updated 12d ago

Evaluation Results

MethodLinks
2026.05
38.4-37.949.9
2026.05
35.6-35.146.3
2026.05
35.6-35.146.5
2026.05
34-33.544.5
2026.05
33.1-32.743.2
2026.05
33.1-32.743.2
2026.05
28-27.636.7
2026.05
16.3-16.121.6
2026.05
15-14.819.8
2026.05
14.6-14.419.3
2026.05
13.4-13.217.7
2026.05
12.9-12.717.1
2026.05
12.2-12.116.3
2026.05
11.8-11.615.6
2026.05
10-9.913.3
2026.05
1.5-1.41.9
2025.06
-61.6--
2025.06
-53.6--
2025.06
-13.6--
2025.06
-46.3--
2025.06
-53.6--
2025.06
-16--
2025.06
-29.8--
2025.06
-46.8--
2025.06
-54.4--
2025.06
-54.7--
2025.06
-55.9--