Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Stack

Benchmarks

Task NameDataset NameSOTA ResultTrend
Abstractive SummarizationStack ConvoSumm 1.0 (test)
ROUGE-139.73
11
Object StackingStack Composition C (test)
Success Rate93.7
10
Object StackingStack Spuriousness S (test)
Success Rate97.6
10
Object StackingStack In-distribution I (test)
Success Rate97.2
10
Robotic ManipulationStack Shifted Environment (test)
Testing Reward0.77
8
Task PlanningStack 1.0 (test)
Average Planning Time Cost (s)5.94
3
Showing 6 of 6 rows