Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-step Reasoning on StrategyQA
Loading...
66.99
Accuracy
Qwen3-8B + SFT + WeMask(TF)
45.9612
51.4206
56.88
62.3394
May 8, 2026
Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-8B + SFT + WeMask(TF)
shot=0-shot, mask rate...
2026.05
66.99
Qwen3-8B + SFT
shot=0-shot, mask rate...
2026.05
66.77
Qwen3-8B + WeMask(SFT)
shot=0-shot, mask rate...
2026.05
66.69
Qwen3-4B + WeMask(SFT)
shot=0-shot, mask rate...
2026.05
64.54
Qwen3-4B + SFT + WeMask(TF)
shot=0-shot, mask rate...
2026.05
64.24
Qwen3-4B + SFT
shot=0-shot, mask rate...
2026.05
64.15
Qwen3-4B + Gated Attention
shot=0-shot, mask rate...
2026.05
62.62
Qwen3-8B + Gated Attention
shot=0-shot, mask rate...
2026.05
62.53
Qwen3-8B
shot=0-shot, mask rate...
2026.05
53.32
Qwen3-4B
shot=0-shot, mask rate...
2026.05
46.77
Feedback
Search any
task
Search any
task