Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Attack Detection on Attack Detection Evaluation Set
Loading...
100
Safety (Th=0.3)
traceguard
-4
23
50
77
Apr 5, 2026
Safety (Th=0.3)
Safety (Th=0.5)
Updated 12d ago
Evaluation Results
Method
Method
Links
Safety (Th=0.3)
Safety (Th=0.5)
traceguard
Scoring Model=Trusted...
2026.04
100
-
collusion_resistant
Scoring Model=Untruste...
2026.04
100
95
separation_of_duties
Scoring Model=Mixed (t...
2026.04
100
86
simple_trusted
Scoring Model=Trusted...
2026.04
0
0
simple_untrusted
Scoring Model=Untruste...
2026.04
0
0
Feedback
Search any
task
Search any
task