Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Concept Steering on GPT-2 (3 concepts x 4 layers x 25 contexts) (test)

1.32Median Off-target KL Ratio

Euclidean

1.27241.59371.9152.2363May 17, 2026
Updated 15d ago

Evaluation Results

MethodLinks
2026.05
1.3272.91096
2026.05
1.4277.61076
2026.05
1.4478.31083
2026.05
1.575.81066
2026.05
1.572.2172
2026.05
1.52741077
2026.05
1.5476.11071
2026.05
1.58781082
2026.05
1.6178.410102
2026.05
1.6578.91090
2026.05
1.9180.610108
2026.05
2.5181.410113