Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Science Question Answering on FrontierScience
Loading...
70.5
Accuracy
Qwen3.5-397B
4.98
21.99
39
56.01
Mar 22, 2026
Apr 1, 2026
Apr 11, 2026
Apr 21, 2026
May 1, 2026
May 11, 2026
May 22, 2026
Accuracy
Updated 9d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3.5-397B
Base Model=Qwen3.5-397B
2026.05
70.5
Qwen3.5-122B
Base Model=Qwen3.5-122B
2026.05
59.8
Kimi-K2.5
Base Model=Kimi-K2.5
2026.05
55.2
Qwen3.5-35B
Base Model=Qwen3.5-35B
2026.05
54.3
Deepseek-V3.2
Base Model=Deepseek-V3.2
2026.05
54.2
Qwen3.5-9B + MaR
Base Model=Qwen3.5-9B,...
2026.05
50
Qwen3.5-9B
Base Model=Qwen3.5-9B
2026.05
42.4
Qwen3.5-9B + DAPO
Base Model=Qwen3.5-9B,...
2026.05
39
GLM-5.1
Base Model=GLM-5.1
2026.05
38.2
ARYA
Prompting Strategy=Zer...
2026.03
37.5
Qwen3.5-4B
Base Model=Qwen3.5-4B
2026.05
34.7
GPT-OSS-120B
Base Model=GPT-OSS-120B
2026.05
34.4
Qwen3.5-4B + MaR
Base Model=Qwen3.5-4B,...
2026.05
34
Qwen3.5-4B + DAPO
Base Model=Qwen3.5-4B,...
2026.05
33
GPT-5.2
Prompting Strategy=Opt...
2026.03
25.8
GPT-5.2 (pub)
Context=Best Published...
2026.03
25.8
Claude Opus 4.6
Prompting Strategy=Zer...
2026.03
8.8
GPT-5.2
Prompting Strategy=Zer...
2026.03
7.5
Feedback
Search any
task
Search any
task