Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-choice Question Answering on LongBench v2

46.5Overall Accuracy

Qwen2.5-14B-1M-LongRLVR

24.430.137535.87541.6125May 19, 2025Jul 5, 2025Aug 22, 2025Oct 9, 2025Nov 26, 2025Jan 13, 2026Mar 2, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
46.555.643.338
2026.03
44.951.742.338.9
2026.03
43.543.548.940.9
2026.03
4153.334.433.3
2026.03
40.251.73433.3
2026.03
39.848.336.731.5
2026.03
39.648.934.933.3
2026.03
38.645.635.832.4
2026.03
37.643.328.832.4
2026.03
36.241.130.738.9
2025.05
34.7940.5634.8825
2025.05
34.1943.3330.725.93
2025.05
34.1941.1133.4924.07
2025.05
34.1941.1133.4924.07
2025.05
33.84033.4924.07
2025.05
33.441.1131.1625
2026.03
33.236.732.628.7
2026.03
3337.831.228.7
2025.05
32.640.5632.0920.37
2026.03
32.435.631.229.6
2026.03
32.437.229.330.6
2025.05
32.3140.5629.8923.95
2026.03
31.236.128.428.7
2026.03
30.434.431.621.3
2025.05
29.2234.4427.9123.15
2025.05
29.223527.4423.15
2025.05
28.6333.8926.9823.15
2025.05
28.5632.2826.0524.07
2025.05
28.4333.3327.4422.22
2025.05
28.2333.8926.5122.02
2025.05
27.6336.6722.7922.22
2025.05
27.4433.8925.1221.3
2025.05
27.4436.1123.7220.37
2025.05
26.8436.1122.3320.37
2025.05
26.8434.4422.3323.15
2025.05
26.4432.2223.2623.15
2025.05
26.2435.5621.8619.44
2026.03
25.936.24534
2025.05
25.8432.222026.85
2025.05
25.4532.7822.7918.52
2025.05
25.2532.7821.8619.44