Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Meta-evaluation on AgentEvalBench 1.0 (test)

85URF

EvalAgent

5965.7572.579.25May 12, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.05
8596.295--97.4
2026.05
859088.8--90
2026.05
83.893.896.2--94.9
2026.05
8098.897.5--100
2026.05
71.278.881.2--84.2
2026.05
68.892.596.2--92.5
2026.05
63.887.592.583.861.390
2026.05
6083.892.57568.887.5