Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Meta-evaluation on AgentEvalBench
Loading...
83.8
URF
EvalAgent
49.896
58.698
67.5
76.302
May 12, 2026
URF
MR
CQC
PQ
PCA
Overall Score
Updated 21d ago
Evaluation Results
Method
Method
Links
URF
MR
CQC
PQ
PCA
Overall Score
EvalAgent
Meta-Evaluator LLM=Son...
2026.05
83.8
90
92.5
-
-
90
EvalAgent
Meta-Evaluator LLM=Hai...
2026.05
81.2
92.5
98.8
-
-
95
EvalAgent
Meta-Evaluator LLM=Hai...
2026.05
80
93.8
93.8
-
-
95
EvalAgent
Meta-Evaluator LLM=Son...
2026.05
76.2
85
92.5
-
-
85
EvalAgent
Meta-Evaluator LLM=Hai...
2026.05
63.7
83.8
91.2
-
-
85
EvalAgent
Meta-Evaluator LLM=Son...
2026.05
63.7
73.8
76.2
-
-
76.2
EvalAgent
Meta-Evaluator LLM=Son...
2026.05
60
81.2
87.5
73.8
56.2
85
EvalAgent
Meta-Evaluator LLM=Hai...
2026.05
51.2
85
95
85
53.8
90
Feedback
Search any
task
Search any
task