Can AI Assistants Know What They Don't Know?

About

Recently, AI assistants based on large language models (LLMs) show surprising performance in many tasks, such as dialogue, solving math problems, writing code, and using tools. Although LLMs possess intensive world knowledge, they still make factual errors when facing some knowledge intensive tasks, like open-domain question answering. These untruthful responses from the AI assistant may cause significant risks in practical applications. We believe that an AI assistant's refusal to answer questions it does not know is a crucial method for reducing hallucinations and making the assistant truthful. Therefore, in this paper, we ask the question "Can AI assistants know what they don't know and express them through natural language?" To answer this question, we construct a model-specific "I don't know" (Idk) dataset for an assistant, which contains its known and unknown questions, based on existing open-domain question answering datasets. Then we align the assistant with its corresponding Idk dataset and observe whether it can refuse to answer its unknown questions after alignment. Experimental results show that after alignment with Idk datasets, the assistant can refuse to answer most its unknown questions. For questions they attempt to answer, the accuracy is significantly higher than before the alignment.

Qinyuan Cheng, Tianxiang Sun, Xiangyang Liu, Wenwei Zhang, Zhangyue Yin, Shimin Li, Linyang Li, Zhengfu He, Kai Chen, Xipeng Qiu• 2024

Related benchmarks

Task	Dataset	Result
Response correctness and completeness evaluation	Wikipedia	F1 Score44	38
Response correctness and completeness evaluation	Medical Q.	F1 Score82	32
Response correctness and completeness evaluation	MATH	F1 Score68	32
Response correctness and completeness evaluation	Coding	F1 Score48	32
Question Answering	TriviaQA TIP	Precision39.25	6

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord