Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-agent Planning and Composition on TaskBench Dailylife APIs (test)
Loading...
95.01
Precision
AutoMAS
90.2595
92.63475
95.01
97.38525
May 5, 2026
Precision
Recall
F1-Score
Task-ArgName F1
Edit Distance
Sequence Similarity
Type Accuracy
N-Tools Accuracy
Updated 28d ago
Evaluation Results
Method
Method
Links
Precision
Recall
F1-Score
Task-ArgName F1
Edit Distance
Sequence Similarity
Type Accuracy
N-Tools Accuracy
AutoMAS
Strategy=ReAct Style,...
2026.05
95.01
88.45
91.61
89.13
0.1031
90.34
62.12
79.54
Feedback
Search any
task
Search any
task