Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
List operations evaluation on ListOps (3, 9) (test)
Loading...
89.9
Mean Accuracy
RIR-GRC
33.74
48.32
62.9
77.48
May 25, 2026
Mean Accuracy
Longest Bin Performance
Updated 8d ago
Evaluation Results
Method
Method
Links
Mean Accuracy
Longest Bin Performance
RIR-GRC
2026.05
89.9
79.2
BBT-GRC
2026.05
84.1
72.5
MLP-LDRU
2026.05
74.7
67.8
TF (ALiBi)
Positional Encoding=ALiBi
2026.05
68.2
55.1
LSTM
2026.05
65.9
50
TF (NoPE)
Positional Encoding=None
2026.05
37.3
32.1
TF (Sin.)
Positional Encoding=Si...
2026.05
35.9
9.8
Feedback
Search any
task
Search any
task