How role-play shapes relevance judgment in zero-shot LLM rankers

About

Large Language Models (LLMs) have emerged as promising zero-shot rankers, but their performance is highly sensitive to prompt formulation. In particular, role-play prompts, where the model is assigned a functional role or identity, often give more robust and accurate relevance rankings. However, the mechanisms and diversity of role-play effects remain underexplored, limiting both effective use and interpretability. In this work, we systematically examine how role-play variations influence zero-shot LLM rankers. We employ causal intervention techniques from mechanistic interpretability to trace how role-play information shapes relevance judgments in LLMs. Our analysis reveals that (1) careful formulation of role descriptions have a large effect on the ranking quality of the LLM; (2) role-play signals are predominantly encoded in early layers and communicate with task instructions in middle layers, while receiving limited interaction with query or document representations. Specifically, we identify a group of attention heads that encode information critical for role-conditioned relevance. These findings not only shed light on the inner workings of role-play in LLM ranking but also offer guidance for designing more effective prompts in IR and beyond, pointing toward broader opportunities for leveraging role-play in zero-shot applications.

Yumeng Wang, Jirui Qi, Catherine Chen, Panagiotis Eustratiadis, Suzan Verberne• 2025

Related benchmarks

Task	Dataset	Result	Rank
Ranking	BEIR selected subset v1.0.0 (test)	TREC-COVID77.92		38
Pointwise Ranking	TREC DL 2020 (test)	nDCG@100.623		19

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord