Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-agent Planning and Composition on TaskBench Multimedia APIs (test)
Loading...
91.12
Precision
AutoMAS
86.564
88.842
91.12
93.398
May 5, 2026
Precision
Recall
F1-Score
Task-ArgName F1
Edit Distance
Sequence Similarity
Type Accuracy
N-Tools Accuracy
Updated 28d ago
Evaluation Results
Method
Method
Links
Precision
Recall
F1-Score
Task-ArgName F1
Edit Distance
Sequence Similarity
Type Accuracy
N-Tools Accuracy
AutoMAS
Strategy=ReAct Style,...
2026.05
91.12
78.88
84.55
45.24
0.1051
90.12
87.52
74.01
Feedback
Search any
task
Search any
task