Share your thoughts, 1 month free Claude Pro on usSee more

Confidence Estimation on Closed-Set tasks

72Accuracy (ACC)

GPT-4.1 + Verbalized Conf.

Updated 3mo ago

Evaluation Results

Method	Links
GPT-4.1 + Verbalized Conf. 2025.11		72	64.8	26.3	26.3
GPT-4.1 2025.11		71.4	-	-	-
GPT-4.1 + Verbalized Probability Distribution 2025.11		71.4	76	16.3	13.6
GPT-4.1 + Verbalized Top-k 2025.11		71.3	73.4	19.7	17.8
DeepSeek-V3 + Verbalized Conf. 2025.11		66.3	68.9	26.4	24.1
DeepSeek-V3 + Verbalized Probability Distribution 2025.11		65.6	73.3	17.8	14.3
DeepSeek-V3 + Verbalized Top-k 2025.11		65.4	70.7	22.5	19.5
DeepSeek-V3 2025.11		65.2	-	-	-
Qwen3-30B-A3B-Instruct + Verbalized Conf. 2025.11		63.9	65.6	32.3	31.3
Qwen3-30B-A3B-Instruct 2025.11		63.8	-	-	-
Qwen3-30B-A3B-Instruct + Verbalized Top-k 2025.11		63.7	64.5	31.4	30.2
Qwen3-30B-A3B-Instruct + Verbalized Probability Distribution 2025.11		63.5	68	19.9	17.1
Qwen3-4B-Instruct 2025.11		55.3	-	-	-
Qwen3-4B-Instruct + Verbalized Top-k 2025.11		54.5	65.5	39.2	39.3
Qwen3-4B-Instruct + Verbalized Probability Distribution 2025.11		54.5	67	24.7	20.2
Qwen3-4B-Instruct + Verbalized Conf. 2025.11		53.5	65.8	42.6	42.5