Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Emergent Misalignment Measurement on Security General evaluation
Loading...
1.21
Misalignment Score
Persona Vectors
-0.3748
10.3226
21.02
31.7174
Aug 8, 2025
Misalignment Score
Incoherence Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Misalignment Score
Incoherence Score
Persona Vectors
Model=Qwen2.5-32B
2025.08
1.21
0.25
KL
Model=Qwen2.5-32B
2025.08
2.92
0
Interleaving
Model=Qwen2.5-32B
2025.08
5.26
8.26
Interleaving++
Model=Qwen2.5-32B
2025.08
5.68
10.98
Interleaving+
Model=Qwen2.5-32B
2025.08
14.36
11.38
Misaligned
Model=Qwen2.5-32B
2025.08
40.83
7.08
Feedback
Search any
task
Search any
task