Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Confidence Estimation on MKQA Russian / ru (test)
Loading...
80
AUROC
Cross-lingual Probe
41.52
51.51
61.5
71.49
May 29, 2026
AUROC
AUPR
Brier Score
ECE
Updated 2d ago
Evaluation Results
Method
Method
Links
AUROC
AUPR
Brier Score
ECE
Cross-lingual Probe
Backbone=Qwen 3 8B, Ev...
2026.05
80
43
15
16
Seq. Likelihood
Backbone=Qwen 3 8B, Ev...
2026.05
79
35
49
62
Mass-Mean Probe
Backbone=Qwen 3 8B, Ev...
2026.05
73
51
40
12
Verbalized Conf.
Backbone=Qwen 3 8B, Ev...
2026.05
72
36
55
56
P(True)
Backbone=Qwen 3 8B, Ev...
2026.05
43
12
12
12
Feedback
Search any
task
Search any
task