Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Idioms on MAGPIE
Loading...
98.8
AUROC
Full Rep.
93.392
94.796
96.2
97.604
Apr 20, 2026
AUROC
Updated 1mo ago
Evaluation Results
Method
Method
Links
AUROC
Full Rep.
Model=GPT-OSS-20B, Rep...
2026.04
98.8
Full Rep.
Model=Gemma2-9B, Repre...
2026.04
98.7
Full Representation Classifier
Backbone=Llama-3.1-8B,...
2026.04
98.5
Subspace
Model=Gemma2-9B, Repre...
2026.04
97.4
Full Rep.
Model=Qwen3-8B, Repres...
2026.04
97.2
one-directional concreteness axis
Backbone=Llama-3.1-8B,...
2026.04
95.2
Subspace
Model=Qwen3-8B, Repres...
2026.04
94.3
Subspace
Model=GPT-OSS-20B, Rep...
2026.04
93.6
Feedback
Search any
task
Search any
task