Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Targeted Answer on Single-Agent Evaluation Set
Loading...
100
R@5
Query+
-2.96
23.77
50.5
77.23
Jan 11, 2026
R@5
SIM
ASR
Updated 4d ago
Evaluation Results
Method
Method
Links
R@5
SIM
ASR
Query+
Model=GPT-4o
2026.01
100
0.76
-
CEM Attack
Model=GPT-4o
2026.01
100
0.85
-
fusion attack
Model=GPT-4o
2026.01
100
0.88
-
Query+
Model=GPT-4o-mini
2026.01
100
0.76
-
fusion attack
Model=GPT-4o-mini
2026.01
100
0.89
-
CEM Attack
Model=GPT-4o-mini
2026.01
98
0.85
-
Query+
Model=GPT-4o
2026.01
1
0.73
-
CEM Attack
Model=GPT-4o
2026.01
1
0.79
-
fusion attack
Model=GPT-4o
2026.01
1
0.85
-
Query+
Model=GPT-4o-mini
2026.01
1
0.73
-
CEM Attack
Model=GPT-4o-mini
2026.01
1
0.79
-
fusion attack
Model=GPT-4o-mini
2026.01
1
0.85
-
Feedback
Search any
task
Search any
task