Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM Judgement Confidence Estimation on TL;DR (test)

0.4269RK

Verbalized Confidence

0.2730840.3130170.352950.392883May 14, 2026
Updated 16d ago

Evaluation Results

MethodLinks
2026.05
0.42690.5798
2026.05
0.41910.5806
2026.05
0.41590.5852
2026.05
0.40850.601
2026.05
0.40290.5987
2026.05
0.3970.6081
2026.05
0.39190.6052
2026.05
0.38950.6123
2026.05
0.37180.6271
2026.05
0.36460.6364
2026.05
0.36130.631
2026.05
0.35840.6401
2026.05
0.35720.6409
2026.05
0.33820.6577
2026.05
0.31370.6813
2026.05
0.2790.7021