Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Literal-to-Figurative Steering on Human Evaluation (sample of 100)
Loading...
15
Successful Sentences
GPT-OSS-20B
-0.6
3.45
7.5
11.55
Apr 20, 2026
Successful Sentences
Updated 1mo ago
Evaluation Results
Method
Method
Links
Successful Sentences
GPT-OSS-20B
Steering status=Steer
2026.04
15
Llama-3.1-8B
Steering status=Steer
2026.04
12
Qwen3-8B
Steering status=Steer
2026.04
10
Gemma2-9B
Steering status=Steer
2026.04
9
Llama-3.1-8B
Steering status=Unsteer
2026.04
0
Qwen3-8B
Steering status=Unsteer
2026.04
0
Gemma2-9B
Steering status=Unsteer
2026.04
0
GPT-OSS-20B
Steering status=Unsteer
2026.04
0
Feedback
Search any
task
Search any
task