Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
SQL Agent data leakage evaluation on Employee Big
Loading...
100
Average Benign Accuracy (BA)
o4-mini
91.68
93.84
96
98.16
Feb 13, 2026
Average Benign Accuracy (BA)
Robust Benign Accuracy (RA)
Expected Queries for Attack (E)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Benign Accuracy (BA)
Robust Benign Accuracy (RA)
Expected Queries for Attack (E)
o4-mini
Setup=Orchestrator wit...
2026.02
100
78.2
-
claude-sonnet-4
Setup=Orchestrator wit...
2026.02
100
93.6
-
gemini-2.5-flash
Setup=Orchestrator wit...
2026.02
100
62.2
9
o4-mini
2026.02
100
76.4
-
claude-sonnet-4
2026.02
100
95
-
gemini-2.5-flash
2026.02
100
66.4
14
gpt-4.1-mini
Setup=Orchestrator wit...
2026.02
96
71.6
6
gpt-4.1-mini
2026.02
96
73.8
9
gpt-4.1
Setup=Orchestrator wit...
2026.02
92
61.6
18
gpt-4.1
2026.02
92
64.8
56
Feedback
Search any
task
Search any
task