Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety and Utility Evaluation on MaliciousGen & WildChat
Loading...
97.69
Rule Adherence
Step-Level
97.2948
97.3974
97.5
97.6026
Jan 12, 2026
Rule Adherence
MD-Judge Score
RM Score
MT-1 Score
Updated 3mo ago
Evaluation Results
Method
Method
Links
Rule Adherence
MD-Judge Score
RM Score
MT-1 Score
Step-Level
Backbone=Qwen2.5-7B, M...
2026.01
97.69
95.38
-0.8
5.24
Shadow-Level
Backbone=Qwen2.5-7B, M...
2026.01
97.69
96.35
-0.77
5.68
Client-Level
Backbone=Qwen2.5-7B, M...
2026.01
97.31
95.77
-0.76
5.33
Feedback
Search any
task
Search any
task