Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing

About

Precise attribute intensity control--generating Large Language Model (LLM) outputs with specific, user-defined attribute intensities--is crucial for AI systems adaptable to diverse user expectations. Current LLM alignment methods, however, typically provide only directional or open-ended guidance, failing to reliably achieve exact attribute intensities. We address this limitation with three key designs: (1) reformulating precise attribute intensity control as a target-reaching problem, rather than simple maximization; (2) training a lightweight value function via temporal-difference learning to predict final attribute intensity scores from partial generations, thereby steering LLM outputs; and (3) employing gradient-based interventions on hidden representations to navigate the model precisely towards specific attribute intensity targets. Our method enables fine-grained, continuous control over attribute intensities, moving beyond simple directional alignment. Experiments on LLaMA-3.2-3b and Phi-4-mini confirm our method's ability to steer text generation to user-specified attribute intensities with high accuracy. Finally, we demonstrate efficiency enhancements across three downstream tasks: preference data synthesis, Pareto frontier approximation and optimization, and distillation of aligned behaviors for intervention-free inference. Our code is available on https://github.com/Pre-Control/pre-control

Rongzhi Zhang, Liqin Ye, Yuzhao Heng, Xiang Chen, Tong Yu, Lingkai Kong, Sudheer Chava, Chao Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Controllable GenerationHelpSteer2
Diversity0.986
36
Controllable GenerationCode-UltraFeedback
Diversity88
36
Attribute-controlled Text GenerationHelpSteer2 Relative Positive Representative Target
Diversity0.946
12
Attribute-controlled Text GenerationHelpSteer2 Negative Representative Target Score (test)
Diversity0.986
12
Attribute-controlled Text GenerationCode-UltraFeedback Relative Positive Representative Target
Diversity88
12
Attribute-controlled Text GenerationCode-UltraFeedback Negative Representative Target Score (test)
Diversity0.614
12
Computational cost comparisonHelpSteer2 (test)
GPU Hours0.09
6
Controllable Model DistillationHelpSteer2
HV16.81
3
Pareto frontier approximationHelpSteer2
Hypervolume (HV)12.66
3
Showing 9 of 9 rows

Other info

Follow for update