Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Code Reasoning on CRUXEval-O
Loading...
83.5
Accuracy
Kimi-K2 Base
0.1752
21.8076
43.44
65.0724
Jan 6, 2026
Jan 13, 2026
Jan 21, 2026
Jan 29, 2026
Feb 5, 2026
Feb 13, 2026
Feb 21, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Kimi-K2 Base
# Shots=1-shot, # Acti...
2026.01
83.5
MiMo-V2-Flash Base
# Shots=1-shot, # Acti...
2026.01
79.1
DeepSeek-V3.1 Base
# Shots=1-shot, # Acti...
2026.01
76.4
DeepSeek-V3.2 Exp Base
# Shots=1-shot, # Acti...
2026.01
74.9
MP
Model=Qwen-3-8B, Promp...
2026.02
56.5
Ann Brown
Model=Qwen-3-8B, Promp...
2026.02
55.88
CoT
Model=Qwen-3-8B, Promp...
2026.02
55.75
Std
Model=Qwen-3-8B, Promp...
2026.02
55.5
CoT
Model=Llama-3-8B, Prom...
2026.02
25.62
Std
Model=Llama-3-8B, Prom...
2026.02
5.25
Ann Brown
Model=Llama-3-8B, Prom...
2026.02
5.12
MP
Model=Llama-3-8B, Prom...
2026.02
3.38
Feedback
Search any
task
Search any
task