Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Reasoning Evaluation on BeaverTails 5,000 prompts (subsampled)
Loading...
4.68
Relevance
AIDSAFE
4.6592
4.6646
4.67
4.6754
May 27, 2025
Relevance
Coherence
Completeness
CoT Faithfulness (Policy)
Response Faithfulness (Policy)
Response Faithfulness (CoT)
Delta (%)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Relevance
Coherence
Completeness
CoT Faithfulness (Policy)
Response Faithfulness (Policy)
Response Faithfulness (CoT)
Delta (%)
AIDSAFE
Agentic Deliberation P...
2025.05
4.68
4.96
4.92
4.27
4.91
5
-
LLM_ZS
Agentic Deliberation P...
2025.05
4.66
4.93
4.86
3.85
4.85
4.99
-
Feedback
Search any
task
Search any
task