Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multistep Soft Reasoning on MUSR (Accuracy %)

50.77Accuracy (Multi-choice)

TMAP

36.761240.398144.03547.6719May 10, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.05
50.77-
2026.05
47.89-
2026.05
47.49-
2026.05
46.43-
2026.05
46.3-
2026.05
46.16-
2026.05
46.03-
2026.05
45.9-
2026.05
45.9-
2026.05
45.44-
2026.05
44.84-
2026.05
44.56-
2026.05
44.44-
2026.05
44.02-
2026.05
43.92-
2026.05
43.77-
2026.05
43.67-
2026.05
43.52-
2026.05
42.99-
2026.05
42.86-
2026.05
42.72-
2026.05
41.8-
2026.05
41.67-
2026.05
40.21-
2026.05
38.89-
2026.05
37.96-
2026.05
37.3-
2025.09
-43.1
2025.09
-39.8
2025.09
-42.1
2025.09
-41.4
2025.09
-38.4
2025.09
-37.7
2025.09
-38.1
2025.09
-40.7
2025.09
-41.5
2025.09
-40.7
2025.09
-39.7
2025.09
-38