Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Generative AI Output Safety on HarmBench and AdvBench (test)

82.88Safe Rate

Reliable Consensus Sampling

-1.807220.178942.16564.1511Dec 31, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
82.88024.26
2025.12
81.49022.21
2025.12
77.42021.63
2025.12
67.9018.42
2025.12
20.3971.7922.6
2025.12
15.8372.5226.73
2025.12
12.5675.7523.59
2025.12
1.4582.9521.43