Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Confidence Estimation on Global-MMLU Russian (test)
Loading...
75
AUROC
Seq. Likelihood
43.8
51.9
60
68.1
May 29, 2026
AUROC
AUPR
Brier Score
ECE
Updated 2d ago
Evaluation Results
Method
Method
Links
AUROC
AUPR
Brier Score
ECE
Seq. Likelihood
Backbone=Qwen 3 8B, Ev...
2026.05
75
47
54
62
Verbalized Conf.
Backbone=Qwen 3 8B, Ev...
2026.05
73
49
51
50
Cross-lingual Probe
Backbone=Qwen 3 8B, Ev...
2026.05
72
43
39
42
Mass-Mean Probe
Backbone=Qwen 3 8B, Ev...
2026.05
62
57
51
4
P(True)
Backbone=Qwen 3 8B, Ev...
2026.05
45
21
21
21
Feedback
Search any
task
Search any
task