Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context reasoning on OfficeQA

57.14Accuracy

GEMINI 3.1 FLASH-LITE

11.837623.598835.3647.1212Apr 6, 2026
Updated 11d ago

Evaluation Results

MethodLinks
57.14
2026.04
55.74
53.37
2026.04
46.58
2026.04
37.84
33.88
2026.04
26.53
2026.04
21.63
2026.04
14.88
2026.04
13.58