Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Adversarial Code Compliance on Java
Loading...
91.7
Decoupling Probability
Llama-3.1-8B
29.716
45.808
61.9
77.992
Jan 29, 2026
Decoupling Probability
Severity Index
Updated 4d ago
Evaluation Results
Method
Method
Links
Decoupling Probability
Severity Index
Llama-3.1-8B
Model Category=Open So...
2026.01
91.7
41.5
DeepSeek-v3.2
Model Category=Open So...
2026.01
90
48.4
Gemma-3-27b
Model Category=Open So...
2026.01
89.1
36.1
Qwen3-235B
Model Category=Open So...
2026.01
85
31.4
Llama-3.2-3B
Model Category=Open So...
2026.01
81.6
30.1
Gemini-2.5-Flash
Model Category=Proprie...
2026.01
78
39
GPT-5
Model Category=Proprie...
2026.01
60.1
17.2
GPT-5-Mini
Model Category=Proprie...
2026.01
34.4
0.9
GPT-OSS-120B
Model Category=Open So...
2026.01
32.1
1.3
Feedback
Search any
task
Search any
task