Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Harmfulness Classification on HarmBench (val)
Loading...
82.9
Agreement Rate
HarmClassifier
73.644
76.047
78.45
80.853
Sep 29, 2025
Agreement Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Agreement Rate
HarmClassifier
Classifier Type=Harmfu...
2025.09
82.9
HarmBench Eval
Classifier Type=Harmfu...
2025.09
82.5
LlamaGuard-4
Classifier Type=Harmfu...
2025.09
80.6
GPTFuzzer Eval
Classifier Type=Harmfu...
2025.09
74
Feedback
Search any
task
Search any
task