Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Inter-Rater Reliability (Kappa/ICC) on MT-Bench
Loading...
71.88
Kappa
CalibraEval
12.0592
27.5896
43.12
58.6504
Oct 20, 2024
Kappa
ICC(2,k)
ICC(3,k)
Updated 4d ago
Evaluation Results
Method
Method
Links
Kappa
ICC(2,k)
ICC(3,k)
CalibraEval
Base Model=Qwen-72B
2024.10
71.88
95.7
96.71
Qwen-72B (Default)
Debiasing=None
2024.10
71.35
90.33
91.16
CalibraEval
Base Model=Llama-3-8B
2024.10
28.63
75.45
76.8
Pride
Base Model=Llama-3-8B
2024.10
27.38
72.27
74.5
DI
Base Model=Llama-3-8B
2024.10
15.93
59.11
65
Llama-3-8B (Default)
Debiasing=None
2024.10
14.36
60.96
73.08
Feedback
Search any
task
Search any
task