Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Evaluation on WildChat
Loading...
97.5
Safe@1
UnsafeChain
91.78
93.265
94.75
96.235
Jul 29, 2025
Safe@1
Updated 19d ago
Evaluation Results
Method
Method
Links
Safe@1
UnsafeChain
Base Model=R1-7B, Data...
2025.07
97.5
UnsafeChain
Base Model=R1-8B, Data...
2025.07
96.5
UnsafeChain
Base Model=R1-7B, Data...
2025.07
96
UnsafeChain
Base Model=R1-7B, Data...
2025.07
96
UnsafeChain
Base Model=R1-1.5B, Da...
2025.07
96
SafeChain
Base Model=R1-8B
2025.07
95.5
STAR-1
Base Model=R1-8B
2025.07
95.5
R1
Model Size=7B
2025.07
95.5
STAR-1
Base Model=R1-7B
2025.07
95.5
R1
Model Size=8B
2025.07
95
UnsafeChain
Base Model=R1-8B, Data...
2025.07
95
SafeChain
Base Model=R1-7B
2025.07
95
R1
Model Size=1.5B
2025.07
95
SafeChain
Base Model=R1-1.5B
2025.07
95
UnsafeChain
Base Model=R1-1.5B, Da...
2025.07
95
UnsafeChain
Base Model=R1-8B, Data...
2025.07
94.5
UnsafeChain
Base Model=R1-1.5B, Da...
2025.07
93.5
STAR-1
Base Model=R1-1.5B
2025.07
92
Feedback
Search any
task
Search any
task