Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Overrefusal Evaluation on Over-refusal Evaluation Suite (OB, FB)
Loading...
9.9
Overrefusal Rate (OB)
WaltzRL
8.348
18.824
29.3
39.776
Oct 9, 2025
Overrefusal Rate (OB)
Overrefusal Rate (FB)
Average Overrefusal Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Overrefusal Rate (OB)
Overrefusal Rate (FB)
Average Overrefusal Rate
WaltzRL
Method Identity=Method 7
2025.10
9.9
5.4
7.6
Single-model RL
Description=Traditiona...
2025.10
11.9
5.2
8.6
Inference-time collaboration
Backbone=Llama-3.1-8B-...
2025.10
18.3
7
12.7
Single-model RL + Safeguard
Safeguard=Llama Guard 4
2025.10
20.7
9.2
14.9
Oracle label-converted feedback
Feedback Source=Ground...
2025.10
28.2
5
16.6
Baseline response
Backbone=Llama-3.1-8B-...
2025.10
45.3
6
25.7
Baseline response + Safeguard
Backbone=Llama-3.1-8B-...
2025.10
48.7
11
29.8
Feedback
Search any
task
Search any
task