Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-Context Reasoning on LOONG
Loading...
65.43
Accuracy
Gemini-3.0-pro
23.4556
34.3528
45.25
56.1472
Mar 23, 2026
Accuracy
Updated 25d ago
Evaluation Results
Method
Method
Links
Accuracy
Gemini-3.0-pro
2026.03
65.43
Deepseek-v3.1
2026.03
50.55
Deepseek-R1-Distill-Qwen-32B + TableLong
Optimization strategy=...
2026.03
45.3
Qwen-Long-L1
2026.03
44.68
Qwen3-32B + TableLong
Optimization strategy=...
2026.03
43.1
Qwen3-32B
Optimization strategy=...
2026.03
39.96
Qwen2.5-32B-Instruct + TableLong
Optimization strategy=...
2026.03
38.18
Deepseek-R1-Distill-Qwen-32B
Optimization strategy=...
2026.03
38.17
Deepseek-R1-Distill-Qwen-14B + TableLong
Optimization strategy=...
2026.03
35.44
Qwen2.5-32B-Instruct
Optimization strategy=...
2026.03
33.22
Deepseek-R1-Distill-Qwen-14B
Optimization strategy=...
2026.03
25.07
Feedback
Search any
task
Search any
task