Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Meta-evaluation on AgentEvalBench

83.8URF

EvalAgent

49.89658.69867.576.302May 12, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.05
83.89092.5--90
2026.05
81.292.598.8--95
2026.05
8093.893.8--95
2026.05
76.28592.5--85
2026.05
63.783.891.2--85
2026.05
63.773.876.2--76.2
2026.05
6081.287.573.856.285
2026.05
51.285958553.890