Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Malicious Action Detection on 100-seed stress orig profile (test)
Loading...
100
AMR
Prior exploration baseline
-1.7432
24.6709
51.085
77.4991
May 17, 2026
AMR
RNR
Balanced Utility
Updated 15d ago
Evaluation Results
Method
Method
Links
AMR
RNR
Balanced Utility
Prior exploration baseline
Profile=orig
2026.05
100
0
50
ToolShield (grouped replay)
Profile=orig
2026.05
10.87
0
94.57
VISTA-Guard precursor (grouped risk model)
Profile=orig
2026.05
4.35
1.85
96.9
MCPMark direct
Profile=orig
2026.05
2.17
46.3
64.47
Feedback
Search any
task
Search any
task