MASH: Modeling Abstention via Selective Help-Seeking

About

LLMs cannot reliably recognize their parametric knowledge boundaries and often hallucinate answers to outside-of-boundary questions. In this paper, we introduce MASH (Modeling Abstention via Selective Help-seeking), a training framework that readily extracts abstentions from LLMs. Our key idea is that any external help-seeking by an LLM, i.e. search tool use, can serve as a proxy for abstention if the external help (search) is appropriately penalized while also rewarding answer accuracy. MASH operationalizes this idea using reinforcement learning with a pay-per-search reward. We run experiments on three knowledge-intensive QA datasets. Our results show that MASH substantially improves upon the selective help-seeking performance of prior efficient search approaches; on multi-hop datasets, it improves answer accuracy by 7.6%. Furthermore, MASH demonstrates strong off-the-shelf abstention performance, showcasing behavior competitive with prior abstention methods that additionally require predetermining model knowledge boundaries to construct training data. Overall, we show MASH training effectively aligns search tool use with parametric knowledge, which can be successfully leveraged for making abstention decisions and efficient search tool use

Mustafa Omer Gul, Claire Cardie, Tanya Goyal• 2025

Related benchmarks

Task	Dataset	Result
Question Answering	2Wiki (test)	EM Accuracy4.6	49
Question Answering	HotpotQA ID	Accuracy55.6	18
Question Answering	NQ ID	Accuracy67	18
Question Answering	SimpleQA-verified OOD	Accuracy41.5	18
Question Answering	HotpotQA (test)	Accuracy20.98	12
Abstention Classification	NaturalQA (test)	Accuracy (Abs=0)99.9	9
Abstention Classification	HotpotQA (test)	Abs(0)0.948	9
Question Answering	HotpotQA (test)	Accuracy17.3	9
Question Answering	NaturalQA (test)	Accuracy20.9	9
Question Answering	HotpotQA (test)	Accuracy55.42	6

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord