Better Zero-Shot Reasoning with Role-Play Prompting

About

Modern large language models (LLMs) exhibit a remarkable capacity for role-playing, enabling them to embody not only human characters but also non-human entities. This versatility allows them to simulate complex human-like interactions and behaviors within various contexts, as well as to emulate specific objects or systems. While these capabilities have enhanced user engagement and introduced novel modes of interaction, the influence of role-playing on LLMs' reasoning abilities remains underexplored. In this study, we introduce a strategically designed role-play prompting methodology and assess its performance under the zero-shot setting across twelve diverse reasoning benchmarks. Our empirical results illustrate that role-play prompting consistently surpasses the standard zero-shot approach across most datasets. Notably, in experiments conducted using ChatGPT, accuracy on AQuA rises from 53.5% to 63.8%, and on Last Letter from 23.8% to 84.2%.Upon further comparison with the Zero-Shot-CoT technique, which prompts the model to "think step by step", our study demonstrates that role-play prompting acts as a more effective trigger for the CoT process. This highlights its potential to augment the reasoning capabilities of LLMs. We release our code at https://github.com/NKU-HLT/Role-Play-Prompting.

Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Enzhi Wang, Xiaohang Dong• 2023

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	AIME 2024	Accuracy76.6	370
Commonsense Reasoning	CSQA	Accuracy73.05	366
Mathematical Reasoning	AQUA-RAT	Accuracy42.13	183
Reasoning	GSM8K	Accuracy0.7703	111
Symbolic Reasoning	Last Letter Concatenation	Accuracy74.2	68
Logic reasoning	Tracking Shuffled Objects BBH	Accuracy71.33	59
Mathematical Reasoning	GSM8K OOD (test)	Accuracy93.58	32
Reasoning	GLOQA (test)	Accuracy49.9	32
Ethical Reasoning	Ethics (test)	Accuracy77.16	32
Commonsense Reasoning	CSQA OOD (test)	Accuracy75.71	32

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord