Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Out-of-Distribution Generalization on MATH-500 OOD (test)
Loading...
67
Score (Sav.)
TTT QK (dh=128)
22.176
33.813
45.45
57.087
Apr 1, 2026
Score (Sav.)
Error Rate
Updated 16d ago
Evaluation Results
Method
Method
Links
Score (Sav.)
Error Rate
TTT QK (dh=128)
Labeling protocol=Supe...
2026.04
67
2.1
TTT no-QK
Labeling protocol=Supe...
2026.04
63.7
2.3
TTT QK (dh=128)
Labeling protocol=Cons...
2026.04
63.7
1.6
TTT no-QK
Labeling protocol=Cons...
2026.04
55.5
1.2
Static Probe
Labeling protocol=Supe...
2026.04
24.8
0.8
Static Probe
Labeling protocol=Cons...
2026.04
23.9
0.4
Feedback
Search any
task
Search any
task