Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Harmlessness on XsTest
Loading...
99
Refusal Rate
Self-Rewarding
67.8
75.9
84
92.1
Mar 23, 2025
Refusal Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Refusal Rate
Self-Rewarding
Backbone=Llama-3.1-8B-...
2025.03
99
STAIR
Backbone=Llama-3.1-8B-...
2025.03
99
DR-IRL
Backbone=Llama-3.1-8B-...
2025.03
99
STAIR
Backbone=Qwen-2-7B-Ins...
2025.03
99
DR-IRL
Backbone=Qwen-2-7B-Ins...
2025.03
98.5
IRL
Backbone=Llama-3.1-8B-...
2025.03
96.5
Self-Rewarding
Backbone=Qwen-2-7B-Ins...
2025.03
96
IRL
Backbone=Qwen-2-7B-Ins...
2025.03
95
SFT
Backbone=Llama-3.1-8B-...
2025.03
94.5
GRPO
Backbone=Llama-3.1-8B-...
2025.03
91.5
GRPO
Backbone=Qwen-2-7B-Ins...
2025.03
89.5
SACPO
Backbone=Llama-3.1-8B-...
2025.03
88.5
Base
Backbone=Llama-3.1-8B-...
2025.03
88
CoT
Backbone=Llama-3.1-8B-...
2025.03
87
DPO
Backbone=Llama-3.1-8B-...
2025.03
86
SFT
Backbone=Qwen-2-7B-Ins...
2025.03
84
SACPO
Backbone=Qwen-2-7B-Ins...
2025.03
75
Base
Backbone=Qwen-2-7B-Ins...
2025.03
72.5
CoT
Backbone=Qwen-2-7B-Ins...
2025.03
70
DPO
Backbone=Qwen-2-7B-Ins...
2025.03
69
Feedback
Search any
task
Search any
task