Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Confidence Estimation on AlpacaEval

0.4395Rank Correlation (RK)

Verbalized Confidence

0.27830.320150.3620.40385May 14, 2026
Updated 16d ago

Evaluation Results

MethodLinks
2026.05
0.43950.5589
2026.05
0.43390.557
2026.05
0.42690.574
2026.05
0.41590.585
2026.05
0.41290.5904
2026.05
0.39580.6022
2026.05
0.38740.6107
2026.05
0.38710.6189
2026.05
0.33450.6679
2026.05
0.32970.671
2026.05
0.28450.7012