Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Knowledge-Intensive Reasoning on MuSiQue (F1 score)

34.8F1 Score

Llama3.1-8B + ARPO

2.35210.77619.227.624Dec 11, 2025Jan 8, 2026Feb 6, 2026Mar 7, 2026Apr 4, 2026May 3, 2026Jun 1, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2025.12
34.8--
2026.06
33.1222.46
2026.06
32.8162.58
2026.06
32.117.53.1
2025.12
31.1--
2025.12
31--
2026.05
31--
2026.06
30.715.42.26
2025.12
30.6--
2026.06
30.616.73.07
2026.06
30.215.92.37
2025.12
30--
2025.12
29.9--
2025.12
29.2--
2025.12
28.7--
2025.12
28.6--
2026.05
28.3--
2025.12
27.9--
2026.05
27.5--
2025.12
25.2--
2026.06
25.113.54.71
2025.12
24.7--
2026.06
23.417.52.26
2026.06
22.5122.33
2026.05
21.2--
2026.05
20.9--
2026.06
19.411.52.62
2026.06
17.89.72.67
2026.06
17.79.82.48
2026.06
16.8114.07
2026.06
167.82.51
2026.06
15.78.52.52
2025.12
15.5--
2025.12
10.4--
2026.06
9.87.4-
2025.12
9.5--
2026.06
7.72.52.56
2025.12
6.6--
2026.06
6.52.5-
2025.12
6.1--
2026.06
6.13.21.59
2025.12
3.6--
2026.06
3.60.5-