Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-step Reasoning on StrategyQA (test)
Loading...
64.63
Accuracy
Qwen3-4B + SFT + WeMask(TF)
61.6348
62.4124
63.19
63.9676
May 8, 2026
Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-4B + SFT + WeMask(TF)
Mask Rate=0.3, Trainin...
2026.05
64.63
Qwen3-4B + WeMask(SFT)
Mask Rate=0.1, Trainin...
2026.05
64.54
Qwen3-4B + SFT + WeMask(TF)
Mask Rate=0.1, Trainin...
2026.05
64.24
Qwen3-4B + SFT + WeMask(TF)
Mask Rate=0.7, Trainin...
2026.05
64.22
Qwen3-4B + SFT + WeMask(TF)
Mask Rate=0.5, Trainin...
2026.05
64.19
Qwen3-4B + SFT
Mask Rate=-, Training...
2026.05
64.15
Qwen3-4B + WeMask(SFT)
Mask Rate=0.3, Trainin...
2026.05
64.06
Qwen3-4B + WeMask(SFT)
Mask Rate=0.5, Trainin...
2026.05
63.76
Qwen3-4B + WeMask(SFT)
Mask Rate=0.7, Trainin...
2026.05
63.62
Qwen3-4B + WeMask(SFT)
Mask Rate=1.0, Trainin...
2026.05
62.05
Qwen3-4B + SFT + WeMask(TF)
Mask Rate=1.0, Trainin...
2026.05
61.75
Feedback
Search any
task
Search any
task