Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool Misuse on Single-Agent Evaluation Set
Loading...
1
R@5
Query+
0.9792
0.9846
0.99
0.9954
Jan 11, 2026
R@5
SIM
Sent
Phishing Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
R@5
SIM
Sent
Phishing Rate
Query+
Model=GPT-4o
2026.01
1
0.78
-
-
CEM Attack
Model=GPT-4o
2026.01
1
0.83
-
-
Query+
Model=GPT-4o-mini
2026.01
1
0.78
-
-
CEM Attack
Model=GPT-4o-mini
2026.01
1
0.83
-
-
fusion attack
Model=GPT-4o-mini
2026.01
1
0.87
-
-
fusion attack
Model=GPT-4o
2026.01
0.98
0.87
-
-
Ideal
Model=GPT-4o
2026.01
-
-
100
100
Feedback
Search any
task
Search any
task