Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AgentNoiseBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agent PerformanceAgentNoiseBench-Vita Noisy setting 1.0 (test)
Delivery Avg@4 Score28.75
10
Agent PerformanceAgentNoiseBench Noisy setting tau2 1.0 (test)
Retail Avg@4 Score43.2
10
Showing 2 of 2 rows