Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Evaluation on Trotter
Loading...
30.8
OverRefusal Score
Base
10.312
15.631
20.95
26.269
May 9, 2026
OverRefusal Score
Updated 22d ago
Evaluation Results
Method
Method
Links
OverRefusal Score
Base
Model Backbone=DeepSee...
2026.05
30.8
STAR-1
Model Backbone=DeepSee...
2026.05
28.8
SafeChain
Model Backbone=DeepSee...
2026.05
27.3
STAR-1
Model Backbone=DeepSee...
2026.05
26.3
Base
Model Backbone=DeepSee...
2026.05
23.7
SInternal
Model Backbone=DeepSee...
2026.05
23.7
SafeChain
Model Backbone=DeepSee...
2026.05
23.7
STAR-1
Model Backbone=DeepSee...
2026.05
21.7
SInternal
Model Backbone=DeepSee...
2026.05
21.7
SafeChain
Model Backbone=DeepSee...
2026.05
18.7
Base
Model Backbone=DeepSee...
2026.05
14.4
SInternal
Model Backbone=DeepSee...
2026.05
11.1
Feedback
Search any
task
Search any
task