Share your thoughts, 1 month free Claude Pro on usSee more

Long-horizon agentic task on Healthbench Hard

28.06Performance

AggAgent

Updated 1mo ago

Evaluation Results

Method	Links
AggAgent 2026.04		28.06
AggAgent 2026.04		27.99
Solution Aggregation 2026.04		26.3
AggAgent 2026.04		24.46
Summary Aggregation 2026.04		23
Solution Aggregation 2026.04		21.84
Summary Aggregation 2026.04		16.92
Solution Aggregation 2026.04		15.72
Best-of-N 2026.04		13.01
Pass@1 2026.04		12.87
Fewest Tool Calls 2026.04		12.83
Best-of-N 2026.04		9.91
Pass@1 2026.04		9.67
Fewest Tool Calls 2026.04		8.9
Best-of-N 2026.04		8.79
Pass@1 2026.04		8.67
Summary Aggregation 2026.04		7.35
Fewest Tool Calls 2026.04		5.34