Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Confidence Estimation on Chatbot Arena
Loading...
0.3524
Rank Correlation (RK)
Verbalized Confidence
0.220216
0.254533
0.28885
0.323167
May 14, 2026
Rank Correlation (RK)
AUROC
Updated 16d ago
Evaluation Results
Method
Method
Links
Rank Correlation (RK)
AUROC
Verbalized Confidence
Model=Llama3-7B
2026.05
0.3524
64.71
Random Annotator
Model=Llama3-7B
2026.05
0.3428
65.47
Predictive Probability
Model=Llama3-7B
2026.05
0.3407
66.15
Simulated Annotators
Model=Llama3-7B
2026.05
0.3376
65.91
Learning Confidence (Vanilla)
Model=Llama3-7B
2026.05
0.2956
69.13
Random Annotator
Model=Qwen2.5-32B
2026.05
0.2812
72.04
Predictive Probability
Model=Qwen2.5-32B
2026.05
0.2708
72.93
Simulated Annotators
Model=Qwen2.5-32B
2026.05
0.2697
73.09
Learning Confidence (Ours)
Model=Llama3-7B
2026.05
0.2658
72.01
Learning Confidence (Vanilla)
Model=Qwen2.5-32B
2026.05
0.2574
74.33
Learning Confidence (Ours)
Model=Qwen2.5-32B
2026.05
0.2253
77.49
Feedback
Search any
task
Search any
task