Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Evaluation on Manual Evaluation Safety Dataset
Loading...
3.83
Average Safety Score
M_Self-MOA
2.0308
2.4979
2.965
3.4321
Mar 7, 2026
Average Safety Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Safety Score
M_Self-MOA
Base Model=qwen
2026.03
3.83
M_PKU-RLHF
Base Model=llama
2026.03
3.8
M_Self-MOA
Base Model=llama
2026.03
3.67
M_PKU-RLHF
Base Model=gemma-3
2026.03
3.67
M_PKU-RLHF
Base Model=qwen
2026.03
3.37
M_Self-MOA
Base Model=gemma-3
2026.03
3.3
M_Self-MOA
Base Model=gemma-2
2026.03
3.2
M_base
Base Model=gemma-3
2026.03
2.97
M_base
Base Model=llama
2026.03
2.77
M_base
Base Model=qwen
2026.03
2.43
M_PKU-RLHF
Base Model=gemma-2
2026.03
2.13
M_base
Base Model=gemma-2
2026.03
2.1
Feedback
Search any
task
Search any
task