Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Policy Enforcement on PhantomPolicy (human-reviewed trace labels)
Loading...
40.7
Risky-case Violation Rate
Policy-in-prompt
38.516
53.258
68
82.742
Apr 14, 2026
Risky-case Violation Rate
Accuracy
F1 Score
Precision
Recall
Safe-case Error Rate
False Positives Count
Missed Violations Count
Updated 1mo ago
Evaluation Results
Method
Method
Links
Risky-case Violation Rate
Accuracy
F1 Score
Precision
Recall
Safe-case Error Rate
False Positives Count
Missed Violations Count
Policy-in-prompt
Strategy=Policy-in-prompt
2026.04
40.7
-
-
-
-
5.7
-
-
Baseline
Strategy=No enforcement
2026.04
95.3
-
-
-
-
-
-
-
Content DLP
Strategy=Content DLP
2026.04
-
68.83
56.61
96.06
40.13
-
-
-
Sentinel
Strategy=Sentinel
2026.04
-
92.99
92.71
-
-
-
5
37
Feedback
Search any
task
Search any
task