Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

OR-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Overrefusal DetectionOR-Bench
AUROC93.42
18
Over-refusal rate analysisOR-Bench
Over-refusal Rate24.1
15
Adversarial AttackOR-Bench unsafe inputs
ASR88
4
Safety ClassificationOR-Bench
F1 Score77
3
Showing 4 of 4 rows