Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Policy Corruption Evaluation on GPT-4o mini
Loading...
3.53
Compliance Score
HPM
0.93
1.605
2.28
2.955
Dec 20, 2025
Compliance Score
Trustfulness Score
Recklessness Score
Harm Violation Score
Value Drift Score
Self Doubt Score
Confusion Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Compliance Score
Trustfulness Score
Recklessness Score
Harm Violation Score
Value Drift Score
Self Doubt Score
Confusion Score
HPM
Victim Model=GPT-4o-mini
2025.12
3.53
3.21
2.95
3.6
2.81
2.9
3.04
PAP
Victim Model=GPT-4o-mini
2025.12
2.25
2.01
1.85
2.22
1.25
1.55
1.43
CoA
Victim Model=GPT-4o-mini
2025.12
1.81
1.53
1.3
1.92
0.88
0.81
1.05
PAIR
Victim Model=GPT-4o-mini
2025.12
1.22
1.01
0.85
1.03
0.61
0.42
0.55
AutoDAN
Victim Model=GPT-4o-mini
2025.12
1.03
0.84
0.62
0.81
0.45
0.22
0.31
Feedback
Search any
task
Search any
task