Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Metaphor on MUNCH
Loading...
98.6
AUROC
Full Rep.
91.736
93.518
95.3
97.082
Apr 20, 2026
AUROC
Updated 1mo ago
Evaluation Results
Method
Method
Links
AUROC
Full Rep.
Model=Gemma2-9B, Repre...
2026.04
98.6
Full Rep.
Model=GPT-OSS-20B, Rep...
2026.04
97.9
Subspace
Model=Gemma2-9B, Repre...
2026.04
96.5
Full Rep.
Model=Qwen3-8B, Repres...
2026.04
96.3
Full Representation Classifier
Backbone=Llama-3.1-8B,...
2026.04
95.1
Subspace
Model=GPT-OSS-20B, Rep...
2026.04
93.2
one-directional concreteness axis
Backbone=Llama-3.1-8B,...
2026.04
93.2
Subspace
Model=Qwen3-8B, Repres...
2026.04
92
Feedback
Search any
task
Search any
task