From Passive to Persuasive: Localized Activation Injection for Empathy and Negotiation
About
Complex social behaviors, such as empathy and strategic politeness, are widely assumed to resist the directional decomposition that makes activation steering effective for coarse attributes like sentiment or toxicity. We present STAR: Steering via Attribution and Representation, which tests this assumption by using attribution patching to identify the layer--token positions where each behavioral trait causally originates, then injecting contrastive activation vectors at precisely those locations. Evaluated on emotional dialogue and negotiation in both single- and multi-turn settings, localized injection consistently outperforms global steering and instruction priming; human evaluation confirms that gains reflect genuine improvements in perceived quality rather than lexical surface change. Our results suggest that complex interpersonal behaviors are encoded as localized, approximately linear directions in LLM activation space, and that behavioral alignment is fundamentally a localization problem.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Disclosure | BOLT SMS multi-turn | M_steer5 | 28 | |
| Disclosure | BOLT SMS single-turn | T-Statistic66.34 | 28 | |
| Emotional Support | BOLT SMS multi-turn | M_steer Score0.05 | 28 | |
| Emotional Support | BOLT SMS single-turn | T-Statistic66.45 | 28 | |
| Disclosure | BOLT SMS multi-turn (test) | Chi-Squared Statistic0.01 | 14 | |
| Disclosure | BOLT SMS single-turn (test) | Chi-Squared (χ2)0.41 | 14 | |
| Emotional Support | BOLT SMS multi-turn (test) | Chi-Squared0.03 | 14 | |
| Emotional Support | BOLT SMS single-turn (test) | Chi-squared (χ2)948.7 | 14 | |
| Multi-Turn Negotiation | Craigslist Bargain (test) | Gratitude51 | 3 | |
| Negotiation | Craigslist Bargain Single-Turn | Agreement Rate4.2 | 3 |