Share your thoughts, 1 month free Claude Pro on usSee more

Agent Evaluation Dataset (20 agents x 2 requirement types)

0.68Time (min)

LLM-Singleturn

Updated 2mo ago

Evaluation Results

Method	Links
LLM-Singleturn 2026.05		0.68	50.26
LLM-Singleturn 2026.05		1.36	49.99
EvalAgent 2026.05		3.98	2,872.07
Agent-Sourcecode 2026.05		4.03	869.36
EvalAgent 2026.05		4.18	2,094.98
Agent-Onestage 2026.05		4.39	1,698.44
Agent-Sourcecode 2026.05		4.54	2,196.16
Agent-Onestage 2026.05		5.3	3,579.46
Agent-Twostage 2026.05		6.72	4,049.63
Agent-Twostage 2026.05		10	3,023.59