Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Long-context language modeling on HELMET

247Summarization Score

Qwen2.5-14B + RLVR

-3.317661.6687126.655191.6413Sep 28, 2025Oct 19, 2025Nov 10, 2025Dec 2, 2025Dec 23, 2025Jan 14, 2026Feb 5, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2025.09
247-420-46.725.5
2025.09
230-392-52.331.6
2025.09
212-564-61.435
2025.09
210-167-52.725.8
2025.09
162-428-5928.1
2025.09
143-26-49.318.5
2025.09
137-104-54.224.2
2025.09
41-35-5011.5
2025.09
37-15-42.415.2
2026.02
32.7821.0285.59967.8361.23
2026.02
32.3719.06-34-28.48
2026.02
30.8815.0786.497.8863.1758.68
2026.02
30.8116.8983.798.8867.7259.6
2026.02
29.7613.179.499.6368.6758.11
2026.02
29.186.4674.699.7568.3855.74
2026.02
28.468.3980.299.1367.4456.86
2026.02
27.8918.2384.77656.7852.72
2026.02
25.79.287798.567.555.6
2026.02
25.687.4473.199.7568.0654.81
2026.02
21.768.5283.498.1366.6755.7
2026.02
18.5510.1679.898.566.1154.62
2026.02
11.17.6785.597.6362.7852.94
2026.02
9.339.3177.382.1362.0548.02
2026.02
9.067.93-26.13-14.37
2026.02
8.69.24-36.86-18.23
2026.02
6.9310.3182.294.3859.7250.71
2026.02
6.318.2182.171.3853.4444.29