Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Misalignment Detection on MoralChain (test)
Loading...
89.4
Accuracy
Linear Probe
57.056
65.453
73.85
82.247
Apr 25, 2026
Accuracy
AUROC
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
AUROC
Linear Probe
Token=z1, Evaluation P...
2026.04
89.4
0.951
Linear Probe
Token=z2, Evaluation P...
2026.04
84.7
0.922
Linear Probe
Token=z3, Evaluation P...
2026.04
78.3
0.874
Linear Probe
Token=z4, Evaluation P...
2026.04
71.2
0.803
Linear Probe
Token=z5, Evaluation P...
2026.04
64.8
0.724
Linear Probe
Token=z6, Evaluation P...
2026.04
58.3
0.651
Feedback
Search any
task
Search any
task