Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Confidence Estimation on TL;DR
Loading...
0.421
Rank Correlation (RK)
Verbalized Confidence
0.315544
0.342922
0.3703
0.397678
May 14, 2026
Rank Correlation (RK)
AUROC
Updated 16d ago
Evaluation Results
Method
Method
Links
Rank Correlation (RK)
AUROC
Verbalized Confidence
Model=Llama3-7B
2026.05
0.421
58.05
Predictive Probability
Model=Llama3-7B
2026.05
0.4094
58.93
Random Annotator
Model=Llama3-7B
2026.05
0.4012
60.34
Predictive Probability
Model=Qwen2.5-32B
2026.05
0.4006
59.24
Simulated Annotators
Model=Llama3-7B
2026.05
0.3975
60.58
Random Annotator
Model=Qwen2.5-32B
2026.05
0.3964
60.63
Simulated Annotators
Model=Qwen2.5-32B
2026.05
0.3931
60.68
Learning Confidence (Vanilla)
Model=Llama3-7B
2026.05
0.3912
61.34
Learning Confidence (Vanilla)
Model=Qwen2.5-32B
2026.05
0.364
63.21
Learning Confidence (Ours)
Model=Llama3-7B
2026.05
0.3461
65.39
Learning Confidence (Ours)
Model=Qwen2.5-32B
2026.05
0.3196
67.6
Feedback
Search any
task
Search any
task