VISPA: Pluralistic Alignment via Automatic Value Selection and Activation
About
As large language models are increasingly used in high-stakes domains, it is essential that their outputs reflect not average} human preference, rather range of varying perspectives. Achieving such pluralism, however, remains challenging. Existing approaches consider limited values or rely on prompt-level interventions, lacking value control and representation. To address this, we introduce VISPA, a training-free pluralistic alignment framework, that enables direct control over value expression by dynamic selection and internal model activation steering. Across extensive empirical studies spanning multiple models and evaluation settings, we show VISPA is performant across all pluralistic alignment modes in healthcare and beyond. Further analysis reveals VISPA is adaptable with different steering initiations, model, and/or values. These results suggest that pluralistic alignment can be achieved through internal activation mechanisms, offering a scalable path toward language models that serves all.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Steerable Value Alignment | OpinionQA | Value Alignment Score68.47 | 42 | |
| Value Alignment | GLOBALOPINIONQA VITAL Distributional 6 values | JS Distance0.18 | 42 | |
| Value Alignment | VITAL Distributional MORALCHOICE | JS Distance0.132 | 42 | |
| Value Alignment | ValueKaleidoscope VITAL Steerable setting | Value Alignment Score68.7 | 42 | |
| Pluralistic Alignment | VITAL Overton | Value Coverage43.76 | 35 | |
| Distributional Alignment | ModPlural (test) | JS Distance0.23 | 3 | |
| Overton Alignment | ModPlural (test) | Value Coverage30.34 | 3 | |
| Steerable Alignment | ModPlural (test) | Accuracy37.34 | 3 |