Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Cybersecurity Benchmarking on ScBen En
Loading...
87.48
En
GPT-5
56.4568
64.5109
72.565
80.6191
Jan 29, 2026
En
Updated 4d ago
Evaluation Results
Method
Method
Links
En
GPT-5
evaluation_context=Lar...
2026.01
87.48
Qwen3-32B
evaluation_context=Lar...
2026.01
84.23
RedSage-8B-CFW
evaluation_context=Bas...
2026.01
83.62
Qwen3-8B-Base
evaluation_context=Bas...
2026.01
82.84
RedSage-8B-Base
evaluation_context=Bas...
2026.01
81.76
RedSage-8B-Seed
evaluation_context=Bas...
2026.01
81.61
RedSage-8B-DPO
evaluation_context=Ins...
2026.01
80.06
RedSage-8B-Ins
evaluation_context=Ins...
2026.01
79.91
Qwen3-8B
evaluation_context=Ins...
2026.01
73.26
Llama-3.1-8B
evaluation_context=Bas...
2026.01
72.8
DeepHat-V1-7B
evaluation_context=Ins...
2026.01
70.63
Foundation-Sec-8B
evaluation_context=Bas...
2026.01
69.86
Foundation-Sec-8B-Instruct
evaluation_context=Ins...
2026.01
68.78
Llama-Primus-Merged
evaluation_context=Ins...
2026.01
64.91
Llama-Primus-Base
evaluation_context=Ins...
2026.01
63.68
Llama-3.1-8B-Instruct
evaluation_context=Ins...
2026.01
59.66
Lily-Cybersecurity-7B-v0.2
evaluation_context=Ins...
2026.01
57.65
Feedback
Search any
task
Search any
task