Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Safety Assessment on Safety Prompts (randomly selected 200 samples per field)

1.5Insensitivity Score

llama2 -> CP -> FT + 0.5 chat vector

-1.0816.33533.7551.165Oct 7, 2023
Updated 4d ago

Evaluation Results

MethodLinks
2023.10
1.500.50.500.50
2023.10
20.510.5000
2023.10
2.500.50.5010.5
2023.10
5231010
2023.10
7.542.52061.5
2023.10
1311.514.52.50112.5
2023.10
13.5385.51.56.55
2023.10
47.528.517164.59
2023.10
662437.51.5115.54