Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
SQL Agent data leakage evaluation on Employee Medium
Loading...
100
Benign Accuracy (BA)
gpt-4.1-mini
97.92
98.46
99
99.54
Feb 13, 2026
Benign Accuracy (BA)
Robust Accuracy (RA)
Expected Queries for Attack (E)
Updated 4d ago
Evaluation Results
Method
Method
Links
Benign Accuracy (BA)
Robust Accuracy (RA)
Expected Queries for Attack (E)
gpt-4.1-mini
Setup=Orchestrator wit...
2026.02
100
73.6
4
o4-mini
Setup=Orchestrator wit...
2026.02
100
84.6
-
claude-sonnet-4
Setup=Orchestrator wit...
2026.02
100
93.6
-
gemini-2.5-flash
Setup=Orchestrator wit...
2026.02
100
61.8
17
gpt-4.1-mini
2026.02
100
76.2
7
o4-mini
2026.02
100
84
-
claude-sonnet-4
2026.02
100
95.2
-
gemini-2.5-flash
2026.02
100
63.6
20
gpt-4.1
Setup=Orchestrator wit...
2026.02
98
63.4
17
gpt-4.1
2026.02
98
64.6
34
Feedback
Search any
task
Search any
task