Share your thoughts, 1 month free Claude Pro on usSee more

Confidence Estimation (Iterative Tagging) on WildHallu

5.7Brier Score (BS)

LOVEC-GRPO

Updated 2mo ago

Evaluation Results

Method	Links
LOVEC-GRPO 2025.05		5.7	2.5	57
LOVEC-DPO 2025.05		6	5	60.4
LOVEC-SFT 2025.05		9.1	15.2	51.1
Vanilla 2025.05		10.8	6	9.1
LUQ 2025.05		14.5	21.5	56.8
p(true)-ft 2025.05		16.4	19.5	47.5
Self-Cons 2025.05		16.5	24.3	47.8
Verb-Conf 2025.05		20.3	22.1	13.4
p(true) 2025.05		23.8	23.6	15.8