Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Figurative-to-Literal Steering on Human Evaluation (sample of 100)
Loading...
75
Successful Sentences Count
GPT-OSS-20B
37.56
47.28
57
66.72
Apr 20, 2026
Successful Sentences Count
Updated 1mo ago
Evaluation Results
Method
Method
Links
Successful Sentences Count
GPT-OSS-20B
Steering status=Steer
2026.04
75
Llama-3.1-8B
Steering status=Steer
2026.04
71
Qwen3-8B
Steering status=Steer
2026.04
68
Gemma2-9B
Steering status=Steer
2026.04
67
Gemma2-9B
Steering status=Unsteer
2026.04
52
Qwen3-8B
Steering status=Unsteer
2026.04
45
Llama-3.1-8B
Steering status=Unsteer
2026.04
42
GPT-OSS-20B
Steering status=Unsteer
2026.04
39
Feedback
Search any
task
Search any
task