Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-step Narrative Reasoning on MUSR (ACC, TOK, η)

65.86Accuracy

Qwen3-4B-Thinking-2507 + DiffAdapt

21.420832.957944.49556.0321May 12, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.05
65.866,8410.909
2026.05
64.426,0811
2026.05
63.841,1885.073
2026.05
62.025,3511.094
2026.05
61.312,1182.732
2026.05
57.882,5292.16
2026.05
57.784,1031.329
2026.05
47.683,9061.152
2026.05
46.34,6331
2026.05
44.445,1731
2026.05
41.673,2381.215
2026.05
41.012,2161.852
2026.05
40.42,7641.701
2026.05
40.24,2291.107
2026.05
39.293,3931.159
2026.05
39.094,7830.951
2026.05
38.994,9520.917
2026.05
38.363,5741.074
2026.05
36.165,7160.736
2026.05
34.344,0080.997
2026.05
29.94,8690.715
2026.05
23.134,6190.583