Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Human Correlation on AgentBench
Loading...
0.77
Pearson r
ADARUBRIC-DA
0.2708
0.4004
0.53
0.6596
Mar 22, 2026
Pearson r
Average Score
Delta
Updated 25d ago
Evaluation Results
Method
Method
Links
Pearson r
Average Score
Delta
ADARUBRIC-DA
Method=AdaRubric, Vari...
2026.03
0.77
0.767
0.147
ADARUBRIC-GM
Method=AdaRubric, Vari...
2026.03
0.74
0.737
0.117
ADARUBRIC-WM
Method=AdaRubric, Vari...
2026.03
0.72
0.72
0.1
GPT-4 Direct
Method=GPT-4 Direct, c...
2026.03
0.62
0.62
-
Prometheus
Method=Prometheus
2026.03
0.59
0.59
0.05
G-Eval
Method=G-Eval
2026.03
0.52
0.517
0.123
BERTScore
Method=BERTScore
2026.03
0.41
0.41
0.23
ROUGE-L
Method=ROUGE-L
2026.03
0.29
0.287
0.353
Feedback
Search any
task
Search any
task