Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Confidence Estimation on Global-MMLU Spanish es (test)
Loading...
74
AUROC
Cross-lingual Probe
39.68
48.59
57.5
66.41
May 29, 2026
AUROC
AUPR
Brier Score
ECE
Updated 2d ago
Evaluation Results
Method
Method
Links
AUROC
AUPR
Brier Score
ECE
Cross-lingual Probe
Backbone=Qwen 3 8B, Ev...
2026.05
74
52
0.23
0.21
Verbalized Conf.
Backbone=Qwen 3 8B, Ev...
2026.05
73
55
0.52
0.54
Seq. Likelihood
Backbone=Qwen 3 8B, Ev...
2026.05
70
51
0.53
0.59
Mass-Mean Probe
Backbone=Qwen 3 8B, Ev...
2026.05
59
50
0.41
0.12
P(True)
Backbone=Qwen 3 8B, Ev...
2026.05
41
23
0.26
0.26
Feedback
Search any
task
Search any
task