PRISM: Probability Reallocation with In-Span Masking for Knowledge-Sensitive Alignment

About

Supervised fine-tuning (SFT) with token-level hard labels can amplify overconfident imitation of factually unsupported targets, causing hallucinations that propagate in multi-sentence generation. We study an augmented SFT setting in which training instances include coarse sentence-level factuality risk labels and inter-sentence dependency annotations, providing structured signals about where factual commitments are weakly supported. We propose \textbf{PRISM}, a differentiable risk-gated framework that modifies learning only at fact-critical positions. PRISM augments standard SFT with a lightweight, model-aware probability reallocation objective that penalizes high-confidence predictions on risky target tokens, with its scope controlled by span-level risk weights and model-aware gating. Experiments on hallucination-sensitive factual benchmarks and general evaluations show that PRISM improves factual aggregates across backbones while maintaining a competitive overall capability profile. Ablations further show that the auxiliary signal is most effective when used conservatively, and that knowledge masking and model-aware reallocation play complementary roles in balancing factual correction and capability preservation.

Chenning Xu, Mao Zheng, Mingyang Song• 2026

Related benchmarks

Task	Dataset	Result	Rank
General Capability Evaluation	General Capability Suite MMLU, GSM8K, HumanEval, IFEval	Common Average Score77.78		39
Factual Knowledge Evaluation	Factual Evaluation Suite HHEM, PopQA, TriviaQA	HHEM Accuracy95.13		12

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord