Share your thoughts, 1 month free Claude Pro on usSee more

Long-horizon agentic task on ResearchRubrics

49.36Performance

AggAgent

Updated 1mo ago

Evaluation Results

Method	Links
AggAgent 2026.04		49.36
AggAgent 2026.04		45.42
AggAgent 2026.04		45.31
Solution Aggregation 2026.04		44
Best-of-N 2026.04		42.37
Solution Aggregation 2026.04		42.1
Pass@1 2026.04		40.5
Summary Aggregation 2026.04		40.29
Pass@1 2026.04		39.97
Fewest Tool Calls 2026.04		39.58
Best-of-N 2026.04		39
Fewest Tool Calls 2026.04		38.44
Best-of-N 2026.04		37.7
Pass@1 2026.04		37.47
Summary Aggregation 2026.04		37.47
Solution Aggregation 2026.04		36.84
Fewest Tool Calls 2026.04		35.21
Summary Aggregation 2026.04		31.72