Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM Steering on Anthropic Model-Written Evaluations (MWE) In-distribution

2.363Steerability

L2S w/ W2S

1.214841.512921.8112.10908Apr 4, 2026
Updated 12d ago

Evaluation Results

MethodLinks
2026.04
2.36391.8
2026.04
2.09889.9
2026.04
2.07191.8
2026.04
1.88887.5
2026.04
1.67585.4
2026.04
1.50284.6
2026.04
1.49383.3
2026.04
1.25975.4