Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Malicious Action Detection on stress test 100-seed redcode_style profile
Loading...
100
AMR
Prior exploration baseline
14.0856
36.3903
58.695
80.9997
May 17, 2026
AMR
RNR
Balanced Utility
Updated 15d ago
Evaluation Results
Method
Method
Links
AMR
RNR
Balanced Utility
Prior exploration baseline
Profile=redcode_style
2026.05
100
0
50
MCPMark direct
Profile=redcode_style
2026.05
80.43
46.3
25.34
ToolShield (grouped replay)
Profile=redcode_style
2026.05
21.74
0
89.13
VISTA-Guard precursor (grouped risk model)
Profile=redcode_style
2026.05
17.39
1.85
90.38
Feedback
Search any
task
Search any
task