Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context Language Understanding on LongBench (Specific Metric Subset including NQA and Delta Avg)

34.92NQA

H2O

3.064811.334919.60527.8751Apr 18, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
34.9254.3164.4547.0832.5334.929869.7354.49-0.15
2026.04
34.8154.2464.4547.0832.734.939870.8854.64-
2026.04
34.4354.6764.3546.7533.0134.849870.4654.56-0.08
2026.04
30.2452.3262.9641.2819.6231.6295.2968.0150.17-4.47
2026.04
30.0853.9655.7129.6232.1433.8799.564.2449.89-0.69
2026.04
3055.1456.1530.0833.8535.299.564.7350.58-
2026.04
29.6954.9956.130.5534.1135.1599.564.8150.610.03
2026.04
26.0252.9252.7926.5418.7531.6192.564.2145.67-4.91
2026.04
25.3546.4255.2326.1516.3933.4160.568.9541.55-13.09
2026.04
20.738.0540.1815.914.9331.2936.565.0332.82-17.76
2026.04
12.2536.5650.7326.9521.0429.35872.8438.46-0.81
2026.04
11.9237.651.6627.5521.6630.6259.573.6239.27-
2026.04
11.6637.2452.0127.8120.8130.1350.573.2237.92-1.35
2026.04
10.6534.2943.9717.5212.9424.4333.571.8731.15-8.12
2026.04
4.2931.1743.0717.9313.7522.6119.562.8326.89-12.38