Prism-$\Delta$: Differential Subspace Steering for Prompt Highlighting in Large Language Models
About
Prompt highlighting steers a large language model to prioritize user-specified text spans during generation. A key challenge is extracting steering directions that capture the difference between relevant and irrelevant contexts, rather than shared structural patterns common to both. We propose PRISM-$\Delta$ (Projection-based Relevance-Informed Steering Method), which decomposes the difference between positive and negative cross-covariance matrices to maximize discriminative energy while eliminating shared directions. Each attention head receives a continuous softplus importance weight, letting weak-but-useful heads contribute at reduced strength. The framework extends naturally to Value representations, capturing content-channel signal that Key-only methods leave unused. Across four benchmarks and five models, PRISM-$\Delta$ matches or exceeds the best existing method on 19 of 20 configurations, with relative gains up to +10.6%, while halving the fluency cost of steering. PRISM-$\Delta$ also scales to long-context retrieval, outperforming the best existing method by up to +4.8% relative gain. PRISM-$\Delta$ is compatible with FlashAttention and adds negligible memory overhead.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Knowledge Editing | CounterFact | Efficacy99.24 | 301 | |
| Gender bias evaluation | Pronoun Change | Performance Score (P)99.66 | 35 | |
| Bias classification | BiasBios | Accuracy92.9 | 35 | |
| Long-context retrieval | Lost-in-the-Middle 30-passage contexts | Average Exact Match62.57 | 20 | |
| Factual Knowledge Editing | CounterFact (indices 0-5000) | -- | 5 | |
| Gender Bias Mitigation | BiasBios (indices 0-4999) | -- | 5 | |
| Pronoun Steering | Pronoun Change (test) | -- | 5 |