Share your thoughts, 1 month free Claude Pro on usSee more

Long-horizon agentic task on DeepSearchQA

66Performance

AggAgent

Updated 1mo ago

Evaluation Results

Method	Links
AggAgent 2026.04		66
AggAgent 2026.04		65.33
Summary Aggregation 2026.04		64
Best-of-N 2026.04		64
Solution Aggregation 2026.04		62.67
Solution Aggregation 2026.04		62
Summary Aggregation 2026.04		61.33
Best-of-N 2026.04		57.33
Fewest Tool Calls 2026.04		56
Fewest Tool Calls 2026.04		54.67
Pass@1 2026.04		54.42
AggAgent 2026.04		49.33
Pass@1 2026.04		49.25
Summary Aggregation 2026.04		47.33
Solution Aggregation 2026.04		46
Best-of-N 2026.04		35.33
Fewest Tool Calls 2026.04		33.33
Pass@1 2026.04		32.42