Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Leveraging In-Context Learning for Political Bias Testing of LLMs

About

A growing body of work has been querying LLMs with political questions to evaluate their potential biases. However, this probing method has limited stability, making comparisons between models unreliable. In this paper, we argue that LLMs need more context. We propose a new probing task, Questionnaire Modeling (QM), that uses human survey data as in-context examples. We show that QM improves the stability of question-based bias evaluation, and demonstrate that it may be used to compare instruction-tuned models to their base versions. Experiments with LLMs of various sizes indicate that instruction tuning can indeed change the direction of bias. Furthermore, we observe a trend that larger models are able to leverage in-context examples more effectively, and generally exhibit smaller bias scores in QM. Data and code are publicly available.

Patrick Haller, Jannis Vamvas, Rico Sennrich, Lena A. J\"ager• 2025

Related benchmarks

TaskDatasetResultRank
Opinion Alignmentsmartvote
Mean Accuracy71.41
17
Opinion Alignmentsmartvote 2023 Swiss national elections (test)
Mean Macro-F166.16
17
Opinion AlignmentANES
Mean Accuracy44.39
17
Opinion AlignmentWoM
Mean Accuracy48.67
17
Opinion AlignmentWahl-O-Mat (WoM) March 2025 (test)
Mean Macro-F128.17
17
Opinion AlignmentAmerican National Election Studies (ANES) 2020 Time Series (test)
Mean Macro-F123.2
17
Showing 6 of 6 rows

Other info

Follow for update