Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

From Passive to Persuasive: Localized Activation Injection for Empathy and Negotiation

About

Complex social behaviors, such as empathy and strategic politeness, are widely assumed to resist the directional decomposition that makes activation steering effective for coarse attributes like sentiment or toxicity. We present STAR: Steering via Attribution and Representation, which tests this assumption by using attribution patching to identify the layer--token positions where each behavioral trait causally originates, then injecting contrastive activation vectors at precisely those locations. Evaluated on emotional dialogue and negotiation in both single- and multi-turn settings, localized injection consistently outperforms global steering and instruction priming; human evaluation confirms that gains reflect genuine improvements in perceived quality rather than lexical surface change. Our results suggest that complex interpersonal behaviors are encoded as localized, approximately linear directions in LLM activation space, and that behavioral alignment is fundamentally a localization problem.

Niranjan Chebrolu, Kokil Jaidka, Gerard Christopher Yeo• 2025

Related benchmarks

TaskDatasetResultRank
DisclosureBOLT SMS multi-turn
M_steer5
28
DisclosureBOLT SMS single-turn
T-Statistic66.34
28
Emotional SupportBOLT SMS multi-turn
M_steer Score0.05
28
Emotional SupportBOLT SMS single-turn
T-Statistic66.45
28
DisclosureBOLT SMS multi-turn (test)
Chi-Squared Statistic0.01
14
DisclosureBOLT SMS single-turn (test)
Chi-Squared (χ2)0.41
14
Emotional SupportBOLT SMS multi-turn (test)
Chi-Squared0.03
14
Emotional SupportBOLT SMS single-turn (test)
Chi-squared (χ2)948.7
14
Multi-Turn NegotiationCraigslist Bargain (test)
Gratitude51
3
NegotiationCraigslist Bargain Single-Turn
Agreement Rate4.2
3
Showing 10 of 12 rows

Other info

Follow for update