Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Misalignment Detection on MoralChain (train)
Loading...
97.2
Accuracy
Linear Probe
88.36
90.655
92.95
95.245
Apr 25, 2026
Accuracy
AUROC
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
AUROC
Linear Probe
Token=z1, Evaluation P...
2026.04
97.2
99.4
Linear Probe
Token=z2, Evaluation P...
2026.04
96.8
99.1
Linear Probe
Token=z3, Evaluation P...
2026.04
95.1
98.7
Linear Probe
Token=z4, Evaluation P...
2026.04
93.4
97.8
Linear Probe
Token=z5, Evaluation P...
2026.04
91.2
96.2
Linear Probe
Token=z6, Evaluation P...
2026.04
88.7
94.3
Feedback
Search any
task
Search any
task