Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Safety Reasoning Evaluation on BeaverTails 5,000 prompts (subsampled)
Loading...
4.68
Relevance
AIDSAFE
4.6592
4.6646
4.67
4.6754
May 27, 2025
Relevance
Coherence
Completeness
CoT Faithfulness (Policy)
Response Faithfulness (Policy)
Response Faithfulness (CoT)
Delta (%)
Updated 4d ago
Evaluation Results
Method
Method
Links
Relevance
Coherence
Completeness
CoT Faithfulness (Policy)
Response Faithfulness (Policy)
Response Faithfulness (CoT)
Delta (%)
AIDSAFE
Agentic Deliberation P...
2025.05
4.68
4.96
4.92
4.27
4.91
5
-
LLM_ZS
Agentic Deliberation P...
2025.05
4.66
4.93
4.86
3.85
4.85
4.99
-
Feedback
Search any
task
Search any
task