Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Math Reasoning on AIME 22-24
Loading...
8.15
Score
Qwen3-4B + WeMask(SFT)
1.2132
3.0141
4.815
6.6159
May 8, 2026
Score
Updated 22d ago
Evaluation Results
Method
Method
Links
Score
Qwen3-4B + WeMask(SFT)
Mask Rate=0.3, Trainin...
2026.05
8.15
Qwen3-4B + SFT + WeMask(TF)
Mask Rate=0.3, Trainin...
2026.05
7.61
Qwen3-4B + SFT + WeMask(TF)
Mask Rate=0.7, Trainin...
2026.05
7.41
Qwen3-4B + WeMask(SFT)
Mask Rate=0.5, Trainin...
2026.05
6.67
Qwen3-4B + SFT + WeMask(TF)
Mask Rate=0.1, Trainin...
2026.05
6.3
Qwen3-4B + SFT + WeMask(TF)
Mask Rate=0.5, Trainin...
2026.05
5.93
Qwen3-4B + SFT
Mask Rate=-, Training...
2026.05
5.92
Qwen3-4B + WeMask(SFT)
Mask Rate=0.7, Trainin...
2026.05
5.92
Qwen3-4B + WeMask(SFT)
Mask Rate=0.1, Trainin...
2026.05
4.07
Qwen3-4B + SFT + WeMask(TF)
Mask Rate=1.0, Trainin...
2026.05
3.33
Qwen3-4B + WeMask(SFT)
Mask Rate=1.0, Trainin...
2026.05
1.48
Feedback
Search any
task
Search any
task