Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Steering on Anthropic Model-Written Evaluations (MWE) In-distribution
Loading...
2.363
Steerability
L2S w/ W2S
1.21484
1.51292
1.811
2.10908
Apr 4, 2026
Steerability
Proportion Steerable Examples
Updated 12d ago
Evaluation Results
Method
Method
Links
Steerability
Proportion Steerable Examples
L2S w/ W2S
Backbone=Llama-2-7B-ch...
2026.04
2.363
91.8
L2S
Backbone=Llama-2-7B-ch...
2026.04
2.098
89.9
L2S w/ W2S
Backbone=Qwen-1.5-14B-...
2026.04
2.071
91.8
L2S
Backbone=Qwen-1.5-14B-...
2026.04
1.888
87.5
CAA w/ W2S
Backbone=Qwen-1.5-14B-...
2026.04
1.675
85.4
CAA w/ W2S
Backbone=Llama-2-7B-ch...
2026.04
1.502
84.6
CAA
Backbone=Qwen-1.5-14B-...
2026.04
1.493
83.3
CAA
Backbone=Llama-2-7B-ch...
2026.04
1.259
75.4
Feedback
Search any
task
Search any
task