Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Confidence Estimation on Closed-Set tasks

72Accuracy (ACC)

GPT-4.1 + Verbalized Conf.

52.7657.75562.7567.745Nov 18, 2025
Updated 16d ago

Evaluation Results

MethodLinks
2025.11
7264.826.326.3
2025.11
71.4---
2025.11
71.47616.313.6
2025.11
71.373.419.717.8
2025.11
66.368.926.424.1
65.673.317.814.3
2025.11
65.470.722.519.5
2025.11
65.2---
2025.11
63.965.632.331.3
2025.11
63.8---
2025.11
63.764.531.430.2
63.56819.917.1
2025.11
55.3---
2025.11
54.565.539.239.3
54.56724.720.2
2025.11
53.565.842.642.5