Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OR-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Refusal EvaluationOR-Bench
Toxic Refusal Rate99.6
40
Safety GuardrailingOR-Bench
False Positive Rate0
26
Overrefusal DetectionOR-Bench
AUROC93.42
18
Over-refusal rate analysisOR-Bench
Over-refusal Rate24.1
15
Benign Prompt FilteringOR-Bench
False Positive Rate0
12
Guardrail False Positive Rate EstimationOR-Bench benign prompts
False Positive Rate0
8
Refusal EvaluationOR-Bench Toxic
Refusal Rate94.66
7
Safety and Utility EvaluationOR-Bench
HarmR5.3
4
Adversarial AttackOR-Bench unsafe inputs
ASR88
4
Safety ClassificationOR-Bench
F1 Score77
3
Showing 10 of 10 rows