Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PhantomPolicy

Benchmarks

Task NameDataset NameSOTA ResultTrend
Policy Violation DetectionPhantomPolicy complete benchmark world-model coverage (human-reviewed trace labels)
True Positives (TP)58
5
Policy Violation DetectionPhantomPolicy safe-control original
Violation Rate0.0333
5
Policy Violation DetectionPhantomPolicy original (violation-ground-truth)
Violated Count54
5
Policy EnforcementPhantomPolicy (human-reviewed trace labels)
Risky-case Violation Rate40.7
2
Showing 4 of 4 rows