Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Confidence Estimation on MKQA Spanish es (test)
Loading...
82
AUROC
Cross-lingual Probe
27.92
41.96
56
70.04
May 29, 2026
AUROC
AUPR
Brier Score
ECE
Updated 2d ago
Evaluation Results
Method
Method
Links
AUROC
AUPR
Brier Score
ECE
Cross-lingual Probe
Backbone=Qwen 3 8B, Ev...
2026.05
82
53
19
20
Seq. Likelihood
Backbone=Qwen 3 8B, Ev...
2026.05
81
52
51
61
Verbalized Conf.
Backbone=Qwen 3 8B, Ev...
2026.05
75
51
54
54
Mass-Mean Probe
Backbone=Qwen 3 8B, Ev...
2026.05
65
52
45
10
P(True)
Backbone=Qwen 3 8B, Ev...
2026.05
30
13
19
19
Feedback
Search any
task
Search any
task