Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context Language Understanding on LongBench (7-Category Performance Summary)

80.29Math Performance

16-bit Baseline

0.646821.32344262.6766Mar 25, 2025
Updated 15d ago

Evaluation Results

MethodLinks
2025.03
80.2955.9752.5833.5542.4717.5648
2025.03
71.4259.7861.2139.9547.7118.0767.78
2025.03
70.2857.4759.0239.7242.4817.2161.33
2025.03
63.3149.3758.2538.0141.3717.2452.17
2025.03
59.8237.4857.537.9140.3917.1746.85
2025.03
56.1852.4653.8833.0539.2617.1126.5
2025.03
52.9958.2361.933.3544.6616.3343
2025.03
51.8640.8439.3621.723.639.895.39
2025.03
49.2840.6852.5432.0437.2217.3813.5
2025.03
40.4152.0956.4236.0841.916.6252.51
2025.03
39.2734.7951.3231.0835.817.1610
2025.03
34.3448.7151.2328.2834.8413.1322.83
2025.03
18.0443.0652.534.0138.8916.145.02
2025.03
12.5933.9736.1718.1919.589.14.83
2025.03
3.7135.9135.2612.3520.529.3111.42