Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Evaluation on Harmful Evaluation Queries
Loading...
75.2
Final HS
SFT
0.216
19.683
39.15
58.617
May 23, 2026
Final HS
BufferLoRA HS
ReinforceLoRA HS
Updated 8d ago
Evaluation Results
Method
Method
Links
Final HS
BufferLoRA HS
ReinforceLoRA HS
SFT
Model=LLaMA3-8B-Instruct
2026.05
75.2
-
-
SFT
Model=LLaMA3-8B-LAT
2026.05
74
-
-
SFT
Model=LLaMA3-8B-ReFAT
2026.05
74
-
-
SFT
Model=LLaMA2-13B-Chat
2026.05
33.2
-
-
Base
Model=LLaMA3-8B-Instruct
2026.05
19.2
-
-
Base
Model=LLaMA3-8B-ReFAT
2026.05
16.2
-
-
Buffer-and-Reinforce (ours)
Model=LLaMA3-8B-Instruct
2026.05
8.4
88.2
4.3
Base
Model=LLaMA2-13B-Chat
2026.05
6.4
-
-
Buffer-and-Reinforce (ours)
Model=LLaMA2-13B-Chat
2026.05
4.3
85.7
1.5
Buffer-and-Reinforce (ours)
Model=LLaMA3-8B-ReFAT
2026.05
3.5
87.3
2.2
Buffer-and-Reinforce (ours)
Model=LLaMA3-8B-LAT
2026.05
3.1
87.6
2.1
Feedback
Search any
task
Search any
task