Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Misaligned Task Learning on Security In-domain
Loading...
2.1
Misalignment
Persona Vectors
1.1708
7.4429
13.715
19.9871
Aug 8, 2025
Misalignment
Incoherence
Updated 1mo ago
Evaluation Results
Method
Method
Links
Misalignment
Incoherence
Persona Vectors
Model=Qwen2.5-32B
2025.08
2.1
3.6
KL
Model=Qwen2.5-32B
2025.08
6.33
0.67
Interleaving
Model=Qwen2.5-32B
2025.08
21.27
40.2
Interleaving++
Model=Qwen2.5-32B
2025.08
22.03
40.07
Interleaving+
Model=Qwen2.5-32B
2025.08
22.2
39.7
Misaligned
Model=Qwen2.5-32B
2025.08
25.33
36
Feedback
Search any
task
Search any
task