Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ListOps

Benchmarks

Task NameDataset NameSOTA ResultTrend
Logical ReasoningListOps
Accuracy78
32
Sequence ClassificationListOps
Accuracy (%)61.39
29
Hierarchical ReasoningListOps Long Range Arena (test)
Accuracy63.04
26
Hierarchical reasoning on symbolic sequencesLong ListOps (test)
Accuracy62.75
22
Logical Expression EvaluationListOps-O Argument Generalization (Arguments 15)
Accuracy79
11
Logical Expression EvaluationListOps-O Argument Generalization (Arguments 10)
Accuracy0.8415
11
Logical Expression EvaluationListOps-O Length Generalization (Lengths 900-1000)
Accuracy99.5
11
Logical Expression EvaluationListOps-O Length Generalization (Lengths 500-600)
Accuracy99.4
11
Logical Expression EvaluationListOps-O Length Generalization (Lengths 200-300)
Accuracy99.9
11
Logical Expression EvaluationListOps-O near-IID (Lengths < 1000, Arguments < 5)
Accuracy99.9
11
List operations evaluationListOps (5, 14) (test)
Mean Accuracy53.1
7
List operations evaluationListOps (5, 9) (test)
Mean Accuracy49.6
7
List operations evaluationListOps (3, 14) (test)
Mean Accuracy79.1
7
List operations evaluationListOps (3, 9) (test)
Mean Accuracy89.9
7
Mathematical Expression EvaluationListOps Long Range Arena (test)
Accuracy41.4
7
Logical operations parsingListOps mid L1024
Accuracy85.4
6
Long-range sequence modelingListOps Long Range Arena (LRA) 2K (test)
Accuracy37.9
6
Long-range sequence modelingListOpsMix (test)
Accuracy70.43
5
Unsupervised ParsingListOps (test)
Accuracy68.07
5
Unsupervised ParsingListOps (val)
Accuracy67.65
5
Unsupervised ParsingListOps simplified (test)
Accuracy (Max)93.78
4
Showing 21 of 21 rows