Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MultiAgentBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-agent research collaborationMultiAgentBench Research
Task Performance75.99
6
Collaborative software engineeringMultiAgentBench Coding (Tree)
Task Performance52.98
6
Collaborative software engineeringMultiAgentBench Coding Graph
Task Performance57.41
6
Multi-agent negotiationMultiAgentBench Bargaining
Task Performance60.48
6
Showing 4 of 4 rows