Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Security Analysis on Security Tasks 15,000 benign samples (test)
Loading...
96.8
F1 (Log)
SecureCAI
77.56
82.555
87.55
92.545
Jan 12, 2026
F1 (Log)
Accuracy (Phishing)
Human Eval (Malware)
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 (Log)
Accuracy (Phishing)
Human Eval (Malware)
Average Score
SecureCAI
2026.01
96.8
95.4
93.2
95.1
Base LLM
2026.01
94.2
91.8
88.6
91.5
Standard CAI
2026.01
92.7
90.1
86.4
89.7
Instr. Hierarchy
2026.01
89.4
87.2
83.5
86.7
Input Filtering
2026.01
78.3
75.6
71.2
75
Feedback
Search any
task
Search any
task