Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context Question Answering on LongBench V2

56.77Overall Accuracy

Gemini-2.5-Flash-Thinking

20.817230.151139.48548.8189Jan 18, 2026Feb 5, 2026Feb 23, 2026Mar 13, 2026Mar 31, 2026Apr 18, 2026May 7, 2026
Updated 26d ago

Evaluation Results

MethodLinks
2026.01
56.7751.4355.25866.67----------
2026.01
54.2747.4354.46076.92----------
2026.01
49.1146.2942.45671.79----------
2026.01
48.3144444664.1----------
2026.01
47.9150.2950.44256.41----------
2026.01
47.0144.5743.253.0661.54----------
2026.01
44.534046.45051.28----------
2026.01
44.4342.4338.45262.82----------
2026.01
43.744439.25046.15----------
2026.01
43.3738.5140.85661.54----------
2026.01
42.9438.1440.64863.46----------
42.31----43.4840.4342.4248.6536.59-----
2026.01
42.340.29404362.82----------
2026.01
42.139.143749.560.9----------
2026.01
41.7537.1440.84666.67----------
2026.01
40.463735.641.560.26----------
2026.04
39.33----39.1340.4342.4240.5434.15-----
2026.01
37.283630.636.560.26----------
2026.04
35.85----39.1339.3636.3635.1429.27-----
2026.01
33.739.5728.429.531.41----------
2026.01
33.636.7128.23235.9----------
2026.05
31.2-----26.642.4--3026.733.33533.3
2026.01
31.0134.43293040.38----------
30.96----30.4327.2732.9832.4331.71-----
2026.05
30.8-----27.745.5--3026.728.63522.2
2026.05
30.3-----26.648.5--3033.3193027.8
2026.01
29.6230.1429.83234.62----------
2026.01
28.9331.8629.22932.69----------
2026.04
28.91----30.4333.3324.2429.7326.83-----
2026.01
27.9328.8626.22732.05----------
2026.05
25.3-----21.327.3--302038.11538.9
2026.05
22.2-----18.127.3--2033.328.63011.1
2026.05
22.2-----19.127.3--2526.728.61522.2