Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Stack

Benchmarks

Task NameDataset NameSOTA ResultTrend
Conversation SummarizationStack
QAGS57.75
25
Opinion Diversity CoverageStack
Coverage75
15
Stack push-pop state trackingStack
Accuracy99.98
12
Abstractive SummarizationStack ConvoSumm 1.0 (test)
ROUGE-139.73
11
Object StackingStack Composition C (test)
Success Rate93.7
10
Object StackingStack Spuriousness S (test)
Success Rate97.6
10
Object StackingStack In-distribution I (test)
Success Rate97.2
10
Robotic ManipulationStack Shifted Environment (test)
Testing Reward0.77
8
Dynamic Link PredictionStack ubuntu (inductive)
AUC-ROC83.29
7
Dynamic Link PredictionStack elec (inductive)
AUC-ROC86.07
7
Dynamic Link PredictionStack ubuntu (transductive)
AUC-ROC96.49
7
Dynamic Link PredictionStack elec (transductive)
AUC-ROC97.98
7
ClassificationStack Social axes V2 (test)
Group A Accuracy70.5
5
Task PlanningStack 1.0 (test)
Average Planning Time Cost (s)5.94
3
Class Invariant Synthesisstack
Total Invariants6
1
Showing 15 of 15 rows