Chain-of-Authorization: Embedding authorization into large language models

About

Although Large Language Models (LLMs) have evolved from text generators into the cognitive core of modern AI systems, their inherent lack of authorization awareness exposes these systems to catastrophic risks, ranging from unintentional data leakage to unauthorized command execution. Existing defense mechanisms are fundamentally decoupled from internal reasoning, rendering them insufficient for the complex security demands of dynamic AI systems. Here, we propose the Chain-of-Authorization (CoA) framework, a paradigm that internalizes access control as a foundational cognitive capability. By systematically redesigning the input-output format and fine-tuning the model on synthesized data with complex permission topologies, CoA forces the model to generate a structured authorization trajectory as a causal prerequisite for any substantive response or action, thereby enabling LLMs to internalize access boundaries within dynamic reasoning environments. CoA maintains high utility in authorized scenarios while achieving high rejection rates of unauthorized prompts and robust defense against diverse adversarial attacks. By embedding authorization directly into the reasoning process, CoA provides a principled architectural blueprint for deploying secure LLMs as the cognitive cores of modern AI systems.

Yang Li, Yule Liu, Xinlei He, Youjian Zhao, Qi Li, Ke Xu• 2026

Related benchmarks

Task	Dataset	Result
Massive Multitask Language Understanding	MMLU	Accuracy62.87	129
Question Answering	SQuAD	Exact Match86.13	83
Hazard Knowledge Evaluation	WMDP	Accuracy68.98	26
Mobile Interaction Action Prediction	Mobile Actions	Accuracy96.09	18
Question Answering	CovidQA	Accuracy62.14	15

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord