Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Human Correlation on ToolBench
Loading...
0.74
Pearson r
ADARUBRIC-DA
0.2408
0.3704
0.5
0.6296
Mar 22, 2026
Pearson r
Average Correlation
Average Delta
Updated 25d ago
Evaluation Results
Method
Method
Links
Pearson r
Average Correlation
Average Delta
ADARUBRIC-DA
Method=AdaRubric, Vari...
2026.03
0.74
0.767
0.147
ADARUBRIC-GM
Method=AdaRubric, Vari...
2026.03
0.71
0.737
0.117
ADARUBRIC-WM
Method=AdaRubric, Vari...
2026.03
0.7
0.72
0.1
GPT-4 Direct
Method=GPT-4 Direct, c...
2026.03
0.6
0.62
-
Prometheus
Method=Prometheus
2026.03
0.57
0.59
0.05
G-Eval
Method=G-Eval
2026.03
0.49
0.517
0.123
BERTScore
Method=BERTScore
2026.03
0.39
0.41
0.23
ROUGE-L
Method=ROUGE-L
2026.03
0.26
0.287
0.353
Feedback
Search any
task
Search any
task