Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Automating Agent Evaluation on AgentEvalBench

65Eval@1

EvalAgent

1326.54053.5May 12, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.05
65
2026.05
62.5
2026.05
60
2026.05
45
2026.05
35
2026.05
32.5
2026.05
30
2026.05
17.5
2026.05
17.5
2026.05
15