Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-step Reasoning on StrategyQA

66.99Accuracy

Qwen3-8B + SFT + WeMask(TF)

45.961251.420656.8862.3394May 8, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.05
66.99
2026.05
66.77
2026.05
66.69
2026.05
64.54
2026.05
64.24
2026.05
64.15
2026.05
62.62
2026.05
62.53
2026.05
53.32
2026.05
46.77