Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
In-context retrieval on DROP
Loading...
88.6
Accuracy
BERT-Judge
11.64
31.62
51.6
71.58
Mar 15, 2026
Mar 19, 2026
Mar 23, 2026
Mar 28, 2026
Apr 1, 2026
Apr 5, 2026
Apr 10, 2026
Accuracy
Updated 6d ago
Evaluation Results
Method
Method
Links
Accuracy
BERT-Judge
clean_name=BERT-as-a-J...
2026.04
88.6
Regex
2026.04
77
LLM-Judge
backbone=Qwen-3 0.6B
2026.04
69.3
Transformer++
Parameter scale=410M
2026.03
28.2
Hybrid Gated DeltaNet + M2RNN-3
Parameter scale=410M
2026.03
27.2
Hybrid M2RNN
Parameter scale=410M
2026.03
26.8
Hybrid Mamba-2
Parameter scale=410M
2026.03
26.4
Hybrid Mamba-2 + M2RNN-3
Parameter scale=410M
2026.03
26.4
Hybrid Mamba-2 + M2RNN-1
Parameter scale=410M
2026.03
26.3
Hybrid Gated DeltaNet
Parameter scale=410M
2026.03
26
Hybrid Gated DeltaNet + M2RNN-1
Parameter scale=410M
2026.03
25.6
Gated DeltaNet
Parameter scale=410M
2026.03
24.9
M2RNN
Parameter scale=410M
2026.03
22.9
Mamba-2
Parameter scale=410M
2026.03
19.5
GRU
Parameter scale=410M
2026.03
17.2
RNN
Parameter scale=410M
2026.03
14.6
Feedback
Search any
task
Search any
task