Share your thoughts, 1 month free Claude Pro on usSee more

Confidence Estimation on Open-Set

67.2Accuracy

GPT-4.1 + Verbalized Conf.

Updated 3mo ago

Evaluation Results

Method	Links
GPT-4.1 + Verbalized Conf. 2025.11		67.2	67.9	28.8	27.7
GPT-4.1 + Verbalized Probability Distribution 2025.11		66.9	71.8	21	10
GPT-4.1 2025.11		66.5	-	-	-
GPT-4.1 + Verbalized Top-k 2025.11		64.8	70.5	24	18.6
DeepSeek-V3 + Verbalized Conf. 2025.11		58.3	71.2	30.1	28.5
DeepSeek-V3 2025.11		57.6	-	-	-
DeepSeek-V3 + Verbalized Probability Distribution 2025.11		57.3	73.1	21.5	9.4
DeepSeek-V3 + Verbalized Top-k 2025.11		55.8	67.1	28.4	24.2
Qwen3-30B-A3B-Instruct + Verbalized Top-k 2025.11		44.4	63.4	46.8	47.7
Qwen3-30B-A3B-Instruct + Verbalized Conf. 2025.11		43.5	62.6	51.2	51.9
Qwen3-30B-A3B-Instruct + Verbalized Probability Distribution 2025.11		43.4	68.1	30.4	27.4
Qwen3-30B-A3B-Instruct 2025.11		42.1	-	-	-
Qwen3-4B-Instruct + Verbalized Probability Distribution 2025.11		32.4	68.1	33.3	33.1
Qwen3-4B-Instruct + Verbalized Top-k 2025.11		31	64	53.9	57
Qwen3-4B-Instruct + Verbalized Conf. 2025.11		30.3	64.5	60.9	63
Qwen3-4B-Instruct 2025.11		30.1	-	-	-