Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AGENT

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agent TaskAgent
Accuracy100
16
Goal-Oriented NavigationAGENT new concepts from new initial states (test)
Accuracy73
9
Agent interactionAgent
Clean Success (Eager)100
4
Showing 3 of 3 rows