Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
SQL Agent data leakage evaluation on Employee Big
Loading...
100
Average Benign Accuracy (BA)
o4-mini
91.68
93.84
96
98.16
Feb 13, 2026
Average Benign Accuracy (BA)
Robust Benign Accuracy (RA)
Expected Queries for Attack (E)
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Benign Accuracy (BA)
Robust Benign Accuracy (RA)
Expected Queries for Attack (E)
o4-mini
Setup=Orchestrator wit...
2026.02
100
78.2
-
claude-sonnet-4
Setup=Orchestrator wit...
2026.02
100
93.6
-
gemini-2.5-flash
Setup=Orchestrator wit...
2026.02
100
62.2
9
o4-mini
2026.02
100
76.4
-
claude-sonnet-4
2026.02
100
95
-
gemini-2.5-flash
2026.02
100
66.4
14
gpt-4.1-mini
Setup=Orchestrator wit...
2026.02
96
71.6
6
gpt-4.1-mini
2026.02
96
73.8
9
gpt-4.1
Setup=Orchestrator wit...
2026.02
92
61.6
18
gpt-4.1
2026.02
92
64.8
56
Feedback
Search any
task
Search any
task