SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment

About

Existing preference alignment is a one-size-fits-all alignment mechanism, where the part of the large language model (LLM) parametric knowledge with non-preferred features is uniformly blocked to all the users. However, this part of knowledge can be useful to advanced users whose expertise qualifies them to handle these information. The one-size-fits-all alignment mechanism undermines LLM's utility for these qualified users. To address this problem, we propose SudoLM, a framework that lets LLMs learn access control over specific parametric knowledge for users with different credentials via authorization alignment. SudoLM allows authorized users to unlock their access to all the parametric knowledge with an assigned SUDO key while blocking access to non-qualified users. Experiments on two application scenarios demonstrate that SudoLM effectively controls the user's access to the parametric knowledge and maintains its general utility.

Qin Liu, Fei Wang, Chaowei Xiao, Muhao Chen• 2024

Related benchmarks

Task	Dataset	Result
Multi-task Language Understanding	MMLU	Accuracy63.9	881
Multi-turn Dialogue Evaluation	MT-Bench	Overall Score7.97	532
Massive Multitask Language Understanding	MMLU	Accuracy38.91	137
Question Answering	SQuAD	Exact Match68.48	83
Scientific Reasoning	ARC	Score82.3	29
Hazard Knowledge Evaluation	WMDP	Accuracy35.24	26
Mobile Interaction Action Prediction	Mobile Actions	Accuracy69.06	18
Question Answering	CovidQA	Accuracy59.04	15
Medical Question Answering	Medical QA	GPT-4 Score92.5	9
Privileged knowledge recall	TOFU	ROUGE-L Recall97.6	9

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord