Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Scientific Reasoning on SuperGPQA (Accuracy)
Loading...
44.7
Accuracy
Kimi-K2 Base
11.108
19.829
28.55
37.271
Jan 6, 2026
Jan 8, 2026
Jan 10, 2026
Jan 13, 2026
Jan 15, 2026
Jan 17, 2026
Jan 20, 2026
Accuracy
Updated 2d ago
Evaluation Results
Method
Method
Links
Accuracy
Kimi-K2 Base
# Shots=5-shot, # Acti...
2026.01
44.7
DeepSeek-V3.2 Exp Base
# Shots=5-shot, # Acti...
2026.01
43.6
DeepSeek-V3.1 Base
# Shots=5-shot, # Acti...
2026.01
42.3
MiMo-V2-Flash Base
# Shots=5-shot, # Acti...
2026.01
41.1
Jet-RL
Model=Qwen3-8B-Base, R...
2026.01
35.2
BF16
Model=Qwen3-8B-Base, R...
2026.01
35.1
Initial
Model=Qwen3-8B-Base, R...
2026.01
31.8
BF16-Train-FP8-Rollout
Model=Qwen3-8B-Base, R...
2026.01
30.3
BF16
Model=Qwen2.5-7B, Roll...
2026.01
28.6
Jet-RL
Model=Qwen2.5-7B, Roll...
2026.01
28.5
Initial
Model=Qwen2.5-7B, Roll...
2026.01
25.5
Jet-RL
Model=Llama3.1-8B, Rol...
2026.01
19.9
BF16
Model=Llama3.1-8B, Rol...
2026.01
15.9
BF16-Train-FP8-Rollout
Model=Llama3.1-8B, Rol...
2026.01
14.7
Initial
Model=Llama3.1-8B, Rol...
2026.01
12.4
Feedback
Search any
task
Search any
task