Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multistep Soft Reasoning on MuSR

69Accuracy

GPT-OSS-120B

7.983223.824139.66555.5059Apr 1, 2026
Updated 16d ago

Evaluation Results

MethodLinks
2026.04
6912,700--
2026.04
64.837,1953.8146.38
2026.04
62.4513,419--
2026.04
59.55,994-13.7752.8
2026.04
50.651,328--
2026.04
37.4125,600--
2026.04
345,912-9.1276.91
2026.04
291,603--
2026.04
10.331,061--