Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Harmful Question Answering on BeaverTails HarmfulQA (1k and 10k samples)

0Avg Harmfulness Score

Random

-0.03440.19780.430.6622May 23, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2024.05
0
2024.05
0.05
2024.05
0.05
2024.05
0.05
2024.05
0.05
2024.05
0.05
2024.05
0.05
2024.05
0.05
2024.05
0.07
2024.05
0.08
2024.05
0.08
2024.05
0.1
2024.05
0.11
2024.05
0.12
2024.05
0.12
2024.05
0.13
2024.05
0.23
2024.05
0.24
2024.05
0.26
2024.05
0.28
2024.05
0.28
2024.05
0.37
2024.05
0.38
2024.05
0.46
2024.05
0.47
2024.05
0.47
2024.05
0.49
2024.05
0.52
2024.05
0.58
2024.05
0.64
2024.05
0.66
2024.05
0.68
2024.05
0.7
2024.05
0.7
2024.05
0.72
2024.05
0.72
2024.05
0.72
2024.05
0.72
2024.05
0.73
2024.05
0.73
2024.05
0.73
2024.05
0.73
2024.05
0.73
2024.05
0.74
2024.05
0.74
2024.05
0.74
2024.05
0.74
2024.05
0.74
2024.05
0.75
2024.05
0.75
2024.05
0.75
2024.05
0.75
2024.05
0.75
2024.05
0.76
2024.05
0.76
2024.05
0.76
2024.05
0.76
2024.05
0.77
2024.05
0.77
2024.05
0.77
2024.05
0.82
2024.05
0.84
2024.05
0.86