Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
LLM Routing on MMLU Pro Out-of-Domain
Loading...
65.32
STEM Score
ProbeDirichlet
49.0752
53.2926
57.51
61.7274
Feb 12, 2026
STEM Score
Human Score
Social Sciences Score
Others Score
Updated 4d ago
Evaluation Results
Method
Method
Links
STEM Score
Human Score
Social Sciences Score
Others Score
ProbeDirichlet
signal_modality=Hidden...
2026.02
65.32
57.84
58.82
62.77
SemanticEntropy
signal_modality=Verbos...
2026.02
56.27
51.72
52.9
53.95
ConfidenceMargin
signal_modality=Logit-...
2026.02
54.42
46.97
54.37
49.52
SelfAsk
signal_modality=Verbos...
2026.02
53.74
55.86
56.06
50.91
EmbeddingMLP
signal_modality=Embedd...
2026.02
52.97
53.77
48.16
50.45
MaxLogits
signal_modality=Logit-...
2026.02
50.03
50.53
41.14
46.43
Entropy
signal_modality=Logit-...
2026.02
49.7
52.36
48.54
49.23
Feedback
Search any
task
Search any
task