Share your thoughts, 1 month free Claude Pro on usSee more

LLM Judgement Confidence Estimation on Chatbot Arena (test)

0.3418RK

Verbalized Confidence

Updated 2mo ago

Evaluation Results

Method	Links
Verbalized Confidence 2026.05		0.3418	65.03
Random Annotator 2026.05		0.3355	66.28
Predictive Probability 2026.05		0.3323	66.5
Simulated Annotators 2026.05		0.323	67.67
Learning Confidence (Vanilla) 2026.05		0.2817	70.32
Margin-Adaptive Confidence Ranking 2026.05		0.2743	71.27
Simulated Annotators 2026.05		0.2646	73.54
Random Annotator 2026.05		0.2629	73.41
Predictive Probability 2026.05		0.2552	74.56
Learning Confidence (Vanilla) 2026.05		0.2486	75.2
Random Annotator 2026.05		0.248	75.19
Simulated Annotators 2026.05		0.2469	75.12
Predictive Probability 2026.05		0.2457	75.36
Learning Confidence (Vanilla) 2026.05		0.2435	76.1
Margin-Adaptive Confidence Ranking 2026.05		0.2165	78.72
Margin-Adaptive Confidence Ranking 2026.05		0.2077	78.4