Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AWS Benchmark

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-agent Task FulfillmentAWS Benchmark Mortgage scenario
Goal Success Rate (User)62.07
2
Multi-agent Task FulfillmentAWS Benchmark Travel scenario
User GSR78.79
2
Showing 2 of 2 rows