Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Emergent Misalignment Measurement on Security General evaluation

1.21Misalignment Score

Persona Vectors

-0.374810.322621.0231.7174Aug 8, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.08
1.210.25
2025.08
2.920
2025.08
5.268.26
2025.08
5.6810.98
2025.08
14.3611.38
2025.08
40.837.08