Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Knowledge-Intensive Reasoning on Knowledge-Intensive Reasoning Suite (2Wiki, Bamb, HQA, MuSi, SimQA)

58.42Wiki Score

Qwen2.5-7B + SFT-then-RL

14.40825.82937.2548.671Apr 10, 2026
Updated 6d ago

Evaluation Results

MethodLinks
2026.04
58.453.652.32730.7
2026.04
57.455.355.629.732.7
2026.04
56.354.25632.233.5
2026.04
55.749.650.124.731.5
2026.04
54.347.65026.529.3
2026.04
54.350.949.628.531.6
2026.04
54.247.350.525.330.6
2026.04
53.855.453.830.233.1
2026.04
52.350.351.426.932.6
2026.04
52.350.351.426.932.6
2026.04
48.644.152.427.830.3
2026.04
46.750.848.823.231.1
2026.04
44.944.250.122.231.4
2026.04
44.143.750.526.729
2026.04
42.643.748.722.626.5
2026.04
33.337.937.114.623.2
2026.04
32.941.138.215.921.7
2026.04
30.335.235.816.319.5
2026.04
27.438.529.512.510.8
2026.04
2425.417.98.36.5
2026.04
19.227.511.56.35.9
2026.04
19.227.511.56.35.9
2026.04
18.229.615.37.86.4
2026.04
16.938.412.113.36.3
2026.04
16.124.216.57.28.8