Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Agentic Oversight benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Agentic Oversight
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
Deceivers
FORMALJUDGE
Detection Accuracy
96.92
42
1mo ago
VitaBench
FORMALJUDGE
Detection Accuracy
82.13
42
1mo ago
Agent-SafetyBench
FORMALJUDGE
Detection Accuracy
84.06
42
1mo ago
Showing 3 of 3 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task