Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
List operations evaluation on ListOps (3, 14) (test)
Loading...
79.1
Mean Accuracy
RIR-GRC
23.356
37.828
52.3
66.772
May 25, 2026
Mean Accuracy
Longest Bin
Updated 8d ago
Evaluation Results
Method
Method
Links
Mean Accuracy
Longest Bin
RIR-GRC
2026.05
79.1
66.4
BBT-GRC
2026.05
79.1
67.7
MLP-LDRU
2026.05
69.7
65.2
TF (ALiBi)
Positional Encoding=ALiBi
2026.05
63.3
46.4
LSTM
2026.05
62.7
51.4
TF (NoPE)
Positional Encoding=None
2026.05
37.6
31.2
TF (Sin.)
Positional Encoding=Si...
2026.05
25.5
8.4
Feedback
Search any
task
Search any
task