Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context reasoning on LongSeal
Loading...
64.96
Accuracy
GEMINI 3.1 FLASH-LITE
31.7944
40.4047
49.015
57.6253
Apr 6, 2026
Accuracy
Updated 10d ago
Evaluation Results
Method
Method
Links
Accuracy
GEMINI 3.1 FLASH-LITE
2026.04
64.96
GPT-OSS-20B
Reasoning Effort=High,...
2026.04
64
GEMINI-2.5-PRO
2026.04
59.84
QWEN3.5-35B-A3B-FP8
Precision=FP8
2026.04
58.5
GPT-OSS-20B
Reasoning Effort=High,...
2026.04
52.17
GPT-OSS-120B
2026.04
42.18
QWEN3-4B-INSTRUCT-2507
Training Status=SFT (π...
2026.04
40.94
GPT-OSS-20B
Reasoning Effort=Low,...
2026.04
38.19
GPT-OSS-20B
Reasoning Effort=Low,...
2026.04
34.65
QWEN3-4B-INSTRUCT-2507
Training Status=base
2026.04
33.07
Feedback
Search any
task
Search any
task