SEAT: Sparse Entity-Aware Tuning for Knowledge Adaptation while Preserving Epistemic Abstention

About

Adapting LLMs with new knowledge is increasingly important, but standard fine-tuning often erodes aligned epistemic abstention: the ability to acknowledge when the model does not know. This failure mode is especially concerning in high-stakes settings, where abstention is a critical safeguard against hallucination. We present SEAT, a preventive fine-tuning method that preserves epistemic abstention while maintaining strong knowledge acquisition. SEAT combines sparse tuning, which constrains global activation drift, with entity-perturbed KL regularization, which sharpens local epistemic boundaries and prevents spillover to neighboring knowledge. Crucially, SEAT requires no alignment data, explicit boundary probing, or post-hoc re-alignment, making it attractive for lightweight and privacy-sensitive adaptation. Across models and datasets, SEAT improves human-evaluated abstention on unknown queries by 18%-101% over the strongest baseline while retaining near-perfect target knowledge acquisition, and produces coherent, context-aware abstentions after tuning. Further analyses show that both components are essential, that SEAT more cleanly separates known from unknown queries in representation space, and that it preserves downstream utility. These results identify preservation of epistemic abstention as a core objective for safe knowledge adaptation.

William F. Shen, Xinchi Qiu, Nicola Cancedda, Nicholas D. Lane• 2025

Related benchmarks

Task	Dataset	Result
Fine-tuning for knowledge acquisition and abstention preservation	RWD	FT Score1	14
Fine-tuning for knowledge acquisition and abstention preservation	PISTOL	FT Score99.5	14
Fine-tuning for knowledge acquisition and abstention preservation	TOFU	FT Score99.9	14

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord