Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MAVIS: Multi-Objective Alignment via Inference-Time Value-Guided Selection

About

Large Language Models (LLMs) are increasingly deployed across diverse applications that demand balancing multiple, often conflicting, objectives -- such as helpfulness, harmlessness, or humor. Many traditional methods for aligning outputs to user-specific preferences require fine-tuning models for each objective or for specific preference configurations, which is computationally expensive and inflexible. We introduce \textbf{MAVIS} -- \textit{Multi-Objective Alignment via Inference-Time Value-Guided Selection} -- a lightweight inference-time alignment framework that enables dynamic control over LLM behavior without modifying the base model's weights. MAVIS trains a set of small value models, each corresponding to a distinct objective. At inference time, these value models are combined using user-specified weights to produce a tilting function that adjusts the base model's output distribution toward desired trade-offs. The value models are trained using a simple iterative algorithm that enables monotonic improvement of the KL-regularized policy. We show empirically that MAVIS achieves a superior pareto front compared to baselines which fine-tune per-objective models and combine them post hoc or train a single preference-conditioned value model for guidance. Our code is available at https://github.com/5-Jeremy/MAVIS/tree/main.

Jeremy Carleton, Debajoy Mukherjee, Srinivas Shakkottai, Dileep Kalathil• 2025

Related benchmarks

TaskDatasetResultRank
LLM AlignmentHH-RLHF (test)
Win Rate67.5
21
HarmlessnessHH-RLHF (test)
Reward2.772
4
HelpfulnessHH-RLHF (test)
Reward2.542
4
FaithfulnessSummarization (test)
Reward-0.301
4
HumorHH-RLHF (test)
Reward2.465
4
Multi-objective RLHF alignmentSafeRLHF (test)
Win Rate52
1
Showing 6 of 6 rows

Other info

Follow for update