Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multistep Soft Reasoning on MuSR

69Accuracy

GPT-OSS-120B

7.983223.824139.66555.5059Dec 19, 2024Mar 7, 2025May 24, 2025Aug 10, 2025Oct 27, 2025Jan 13, 2026Apr 1, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
6912,700--
2026.04
64.837,1953.8146.38
2026.04
62.4513,419--
2026.04
59.55,994-13.7752.8
2026.04
50.651,328--
2024.12
46.39---
2024.12
45.58---
2024.12
44.91---
2024.12
44.79---
2024.12
44.51---
2024.12
43.19---
2024.12
43.11---
2024.12
42.06---
2024.12
41.74---
2024.12
41.7---
2024.12
41.65---
2024.12
41.59---
2024.12
41.11---
2024.12
41.07---
2024.12
40.94---
2024.12
40.73---
2024.12
40.72---
2024.12
40.68---
2024.12
40.59---
2024.12
40.57---
2024.12
40.32---
2024.12
39.92---
2024.12
39.77---
2024.12
39.46---
2024.12
39.32---
2024.12
39.05---
2024.12
39.05---
2024.12
38.81---
2024.12
37.99---
2024.12
37.92---
2024.12
37.6---
2026.04
37.4125,600--
2024.12
37.32---
2026.04
345,912-9.1276.91
2026.04
291,603--
2026.04
10.331,061--