Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Better Zero-Shot Reasoning with Role-Play Prompting

About

Modern large language models (LLMs) exhibit a remarkable capacity for role-playing, enabling them to embody not only human characters but also non-human entities. This versatility allows them to simulate complex human-like interactions and behaviors within various contexts, as well as to emulate specific objects or systems. While these capabilities have enhanced user engagement and introduced novel modes of interaction, the influence of role-playing on LLMs' reasoning abilities remains underexplored. In this study, we introduce a strategically designed role-play prompting methodology and assess its performance under the zero-shot setting across twelve diverse reasoning benchmarks. Our empirical results illustrate that role-play prompting consistently surpasses the standard zero-shot approach across most datasets. Notably, in experiments conducted using ChatGPT, accuracy on AQuA rises from 53.5% to 63.8%, and on Last Letter from 23.8% to 84.2%.Upon further comparison with the Zero-Shot-CoT technique, which prompts the model to "think step by step", our study demonstrates that role-play prompting acts as a more effective trigger for the CoT process. This highlights its potential to augment the reasoning capabilities of LLMs. We release our code at https://github.com/NKU-HLT/Role-Play-Prompting.

Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Enzhi Wang, Xiaohang Dong• 2023

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 2024
Accuracy76.6
370
Commonsense ReasoningCSQA
Accuracy73.05
366
Mathematical ReasoningAQUA-RAT
Accuracy42.13
120
ReasoningGSM8K
Accuracy0.7703
106
Symbolic ReasoningLast Letter Concatenation
Accuracy74.2
58
Logic reasoningTracking Shuffled Objects BBH
Accuracy71.33
54
Mathematical ReasoningGSM8K OOD (test)
Accuracy93.58
32
ReasoningGLOQA (test)
Accuracy49.9
32
Ethical ReasoningEthics (test)
Accuracy77.16
32
Commonsense ReasoningCSQA OOD (test)
Accuracy75.71
32
Showing 10 of 13 rows

Other info

Follow for update