Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs

About

Large Language Models (LLMs) have emerged as powerful tools for diverse applications. However, their uniform token processing paradigm introduces critical vulnerabilities in instruction handling, particularly when exposed to adversarial scenarios. In this work, we identify and propose a novel class of vulnerabilities, termed Tool-Completion Attack (TCA), which exploits function-calling mechanisms to subvert model behavior. To evaluate LLM robustness against such threats, we introduce the Tool-Completion benchmark, a comprehensive security assessment framework, which reveals that even state-of-the-art models remain susceptible to TCA, with surprisingly high attack success rates. To address these vulnerabilities, we introduce Context-Aware Hierarchical Learning (CAHL), a sophisticated mechanism that dynamically balances semantic comprehension with role-specific instruction constraints. CAHL leverages the contextual correlations between different instruction segments to establish a robust, context-aware instruction hierarchy. Extensive experiments demonstrate that CAHL significantly enhances LLM robustness against both conventional attacks and the proposed TCA, exhibiting strong generalization capabilities in zero-shot evaluations while still preserving model performance on generic tasks. Our code is available at https://github.com/S2AILab/CAHL.

Tengyun Ma, Jiaqi Yao, Daojing He, Shihao Peng, Yu Li, Shaohui Liu, Zhuotao Tian• 2025

Related benchmarks

TaskDatasetResultRank
Prompt Injection AttackTool-Completion (TCA)
ASR0.12
14
Structured Query Instruction FollowingStruQ clean
Capability78.89
8
Prompt Injection AttackTool-Completion TCA-e
ASR56
7
Prompt Injection AttackTool-Completion Naive-e
ASR15
7
Prompt InjectionGCG Clean
ASR37.02
4
Instruction Adherence and Security RobustnessStruQ 1.0 (Adversarial)
Capability Score83.79
4
Instruction Adherence and Security RobustnessStruQ Clean 1.0
Capability Score83.6
4
Tool CompletionTool-Completion Adv
Capability81.35
3
Tool CompletionTool-Completion Clean
Capability77.12
3
Showing 9 of 9 rows

Other info

Follow for update