Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Meta-evaluation on AgentEvalBench 1.0 (test)
Loading...
85
URF
EvalAgent
59
65.75
72.5
79.25
May 12, 2026
URF
MR
CQC
PQ
PCA
Overall Win-Tie Rate
Updated 21d ago
Evaluation Results
Method
Method
Links
URF
MR
CQC
PQ
PCA
Overall Win-Tie Rate
EvalAgent
Backbone=Haiku 4.5, Co...
2026.05
85
96.2
95
-
-
97.4
EvalAgent
Backbone=Sonnet 4.5, C...
2026.05
85
90
88.8
-
-
90
EvalAgent
Backbone=Sonnet 4.5, C...
2026.05
83.8
93.8
96.2
-
-
94.9
EvalAgent
Backbone=Haiku 4.5, Co...
2026.05
80
98.8
97.5
-
-
100
EvalAgent
Backbone=Sonnet 4.5, C...
2026.05
71.2
78.8
81.2
-
-
84.2
EvalAgent
Backbone=Haiku 4.5, Co...
2026.05
68.8
92.5
96.2
-
-
92.5
EvalAgent
Backbone=Sonnet 4.5, C...
2026.05
63.8
87.5
92.5
83.8
61.3
90
EvalAgent
Backbone=Haiku 4.5, Co...
2026.05
60
83.8
92.5
75
68.8
87.5
Feedback
Search any
task
Search any
task