Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing

About

Precise attribute intensity control--generating Large Language Model (LLM) outputs with specific, user-defined attribute intensities--is crucial for AI systems adaptable to diverse user expectations. Current LLM alignment methods, however, typically provide only directional or open-ended guidance, failing to reliably achieve exact attribute intensities. We address this limitation with three key designs: (1) reformulating precise attribute intensity control as a target-reaching problem, rather than simple maximization; (2) training a lightweight value function via temporal-difference learning to predict final attribute intensity scores from partial generations, thereby steering LLM outputs; and (3) employing gradient-based interventions on hidden representations to navigate the model precisely towards specific attribute intensity targets. Our method enables fine-grained, continuous control over attribute intensities, moving beyond simple directional alignment. Experiments on LLaMA-3.2-3b and Phi-4-mini confirm our method's ability to steer text generation to user-specified attribute intensities with high accuracy. Finally, we demonstrate efficiency enhancements across three downstream tasks: preference data synthesis, Pareto frontier approximation and optimization, and distillation of aligned behaviors for intervention-free inference. Our code is available on https://github.com/Pre-Control/pre-control

Rongzhi Zhang, Liqin Ye, Yuzhao Heng, Xiang Chen, Tong Yu, Lingkai Kong, Sudheer Chava, Chao Zhang• 2025

Related benchmarks

Task	Dataset	Result
Controllable Generation	HelpSteer2	Diversity0.986	36
Controllable Generation	Code-UltraFeedback	Diversity88	36
Attribute-controlled Text Generation	HelpSteer2 Relative Positive Representative Target	Diversity0.946	12
Attribute-controlled Text Generation	HelpSteer2 Negative Representative Target Score (test)	Diversity0.986	12
Attribute-controlled Text Generation	Code-UltraFeedback Relative Positive Representative Target	Diversity88	12
Attribute-controlled Text Generation	Code-UltraFeedback Negative Representative Target Score (test)	Diversity0.614	12
Computational cost comparison	HelpSteer2 (test)	GPU Hours0.09	6
Controllable Model Distillation	HelpSteer2	HV16.81	3
Pareto frontier approximation	HelpSteer2	Hypervolume (HV)12.66	3

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord