Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TIAR: Trajectory-Informed Advantage Reweighting for LLM Abstention Learning

About

This paper investigates large language model (LLM) abstention learning, specifically using ternary reward, which incentivize truthfulness in large language models. This paper extends that idea by moving from a ternary reward to a Trajectory-Informed advantage reweighting, dynamically re-weights the abstention reward during Group Relative Policy Optimization (GRPO) training. The objective of this work focuses on abstention learning instead of improving truthfulness, serving as an exploration into hallucination reduction. The novelty of this paper lies in methodological innovation, advantage re-weighting, and benchmark selection. Leveraging GRPO's multiple trajectories as a natural abstention signal, this method uses a reward signal to explore knowledge boundaries and encourage consistency. By demonstrating that trajectories can be used as a confidence indicator of the policy relative to the query, they are then used to dynamically calculate the abstention advantage. AbstentionBench is used as the evaluation benchmark, as this work aims to contribute to the field of abstention learning. All datasets on the benchmark were tested against this method and various baselines. Empirical results demonstrate that TIAR achieves state-of-the-art abstention F1 scores across five of six evaluation categories, outperforming the static ternary baseline on 17 of 31 benchmark datasets while fully preserving baseline accuracy.

Muyu Pan, Shu Zhao, Nan Zhang, Philip Shin, Varun Parekh, Vijaykrishnan Narayanan, Rui Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Abstention in Question AnsweringUMWP Underspecified Context
Abstention F194
10
Abstention in Question AnsweringKUQ Cont. Subjective
Abstention F187.2
10
Abstention in Question AnsweringBB Answer Unknown
Abstention F197.9
10
Abstention in Question AnsweringBBQ Underspecified Intent
Abstention F191.2
10
Abstention in Question AnsweringQAQA False Premise
Abstention F179.7
10
Abstention in Question AnsweringFreshQA Stale
Abstention F180.4
10
AbstentionAbstentionBench BB Known Unk.
Abstention F195.8
4
AbstentionAbstentionBench FreshQA
Abstention F168.4
4
AbstentionAbstentionBench BBQ
Abstention F187.5
4
AbstentionAbstentionBench KUQ Cont
Abstention F192.9
4
Showing 10 of 12 rows

Other info

Follow for update