Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OR-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Refusal EvaluationOR-Bench
Toxic Refusal Rate99.6
40
Over-refusal rate analysisOR-Bench
Over-refusal Rate16.68
33
Safety GuardrailingOR-Bench
False Positive Rate0
26
Over-refusal EvaluationOR-Bench (boundary cases)
OR-FPR1.7
18
Overrefusal DetectionOR-Bench
AUROC93.42
18
Benign Prompt FilteringOR-Bench
False Positive Rate0
12
Guardrail False Positive Rate EstimationOR-Bench benign prompts
False Positive Rate0
8
Safety ClassificationOR-Bench
F1 Score83.1
8
Refusal EvaluationOR-Bench Toxic
Refusal Rate94.66
7
Safety and Utility EvaluationOR-Bench
HarmR5.3
4
Adversarial AttackOR-Bench unsafe inputs
ASR88
4
Showing 11 of 11 rows