Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-Context Reasoning on Oolong-Synth
Loading...
78.41
Accuracy
Gemini-3.0-pro
25.3284
39.1092
52.89
66.6708
Mar 23, 2026
Accuracy
Updated 25d ago
Evaluation Results
Method
Method
Links
Accuracy
Gemini-3.0-pro
2026.03
78.41
Deepseek-v3.1
2026.03
51.9
Deepseek-R1-Distill-Qwen-32B + TableLong
2026.03
51.41
Deepseek-R1-Distill-Qwen-14B + TableLong
2026.03
45.95
Qwen3-32B + TableLong
2026.03
42.31
Deepseek-R1-Distill-Qwen-14B
2026.03
42.05
Qwen2.5-32B-Instruct
2026.03
41.03
Deepseek-R1-Distill-Qwen-32B
2026.03
39.43
Qwen2.5-32B-Instruct + TableLong
2026.03
35.5
Qwen3-32B
2026.03
31.66
Qwen-Long-L1
2026.03
27.37
Feedback
Search any
task
Search any
task