Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Out-of-Distribution Generalization on AIME'24
Loading...
29.5
Saving Score
TTT QK (dh=128)
11.092
15.871
20.65
25.429
Apr 1, 2026
Saving Score
Error Rate
Updated 16d ago
Evaluation Results
Method
Method
Links
Saving Score
Error Rate
TTT QK (dh=128)
Labeling protocol=Supe...
2026.04
29.5
10
TTT no-QK
Labeling protocol=Supe...
2026.04
29.3
15
TTT QK (dh=128)
Labeling protocol=Cons...
2026.04
18.5
3.3
Static Probe
Labeling protocol=Supe...
2026.04
15.8
5
TTT no-QK
Labeling protocol=Cons...
2026.04
14.1
3.3
Static Probe
Labeling protocol=Cons...
2026.04
11.8
3.3
Feedback
Search any
task
Search any
task