Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Training-Free Cultural Alignment of Large Language Models via Persona Disagreement

About

Large language models increasingly mediate decisions that turn on moral judgement, yet a growing body of evidence shows that their implicit preferences are not culturally neutral. Existing cultural alignment methods either require per-country preference data and fine-tuning budgets or assume white-box access to model internals that commercial APIs do not expose. In this work, we focus on this realistic black-box, public-data-only regime and observe that within-country sociodemographic disagreement, not consensus, is the primary steering signal. We introduce DISCA (Disagreement-Informed Steering for Cultural Alignment), an inference-time method that instantiates each country as a panel of World-Values-Survey-grounded persona agents and converts their disagreement into a bounded, loss-averse logit correction. Across 20 countries and 7 open-weight backbones (2B--70B), DISCA reduces cultural misalignment on MultiTP by 10--24% on the six backbones >=3.8B, and 2--7% on open-ended scenarios, without changing any weights. Our results suggest that inference-time calibration is a scalable alternative to fine-tuning for serving the long tail of global moral preferences.

Huynh Trung Kiet, Dao Sy Duy Minh, Tuan Nguyen, Chi-Nguyen Tran, Phu-Hoa Pham, Nguyen Lam Phu Quy, The Anh Han, Long Tran-Thanh• 2026

Related benchmarks

TaskDatasetResultRank
Value Alignment5-country prototyping panel (BRA, CHN, DEU, JPN, USA)
Mean MIS54.5
13
Cultural AlignmentWVS 20-country grid (macro)
MIS (WVS 20-country macro)34.6
9
Ethical alignment evaluationOpen-ended ethical scenarios 20 countries 310 scenarios each
MIS51
8
Moral preference alignmentMultiTP (20-country slice)
MIS0.668
7
Per-country Preference Alignment20-country human AMCE (test)
MIS (ARG)38.9
4
Showing 5 of 5 rows

Other info

Follow for update