Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Confidence Estimation (Iterative Tagging) on WildHallu

5.7Brier Score (BS)

LOVEC-GRPO

4.9769.86314.7519.637May 29, 2025
Updated 19d ago

Evaluation Results

MethodLinks
2025.05
5.72.557
2025.05
6560.4
2025.05
9.115.251.1
2025.05
10.869.1
2025.05
14.521.556.8
2025.05
16.419.547.5
2025.05
16.524.347.8
2025.05
20.322.113.4
2025.05
23.823.615.8