Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Adversarial Code Compliance on Overall Mean
Loading...
97.1
Decoupling Probability
Llama-3.1-8B
26.796
45.048
63.3
81.552
Jan 29, 2026
Decoupling Probability
Severity Index
Updated 4d ago
Evaluation Results
Method
Method
Links
Decoupling Probability
Severity Index
Llama-3.1-8B
Model Category=Open So...
2026.01
97.1
48.4
DeepSeek-v3.2
Model Category=Open So...
2026.01
95.8
51
Gemma-3-27b
Model Category=Open So...
2026.01
92.2
38
Qwen3-235B
Model Category=Open So...
2026.01
89
30.4
Gemini-2.5-Flash
Model Category=Proprie...
2026.01
83.6
42.9
Llama-3.2-3B
Model Category=Open So...
2026.01
77.7
31.7
GPT-5
Model Category=Proprie...
2026.01
71.6
28.3
GPT-OSS-120B
Model Category=Open So...
2026.01
32.4
1.6
GPT-5-Mini
Model Category=Proprie...
2026.01
29.5
0.6
Feedback
Search any
task
Search any
task