When and What to Ask: AskBench and Rubric-Guided RLVR for LLM Clarification

About

Large language models (LLMs) often respond even when prompts omit critical details or include misleading information, leading to hallucinations or reinforced misconceptions. We study how to evaluate and improve LLMs' ability to decide when and what to ask for clarification without sacrificing task performance. We introduce AskBench, an interactive benchmark that converts standard QA pairs into multi-turn interactions with explicit checkpoints. A unified judge loop evaluates final answers and simulates user responses as needed. AskBench covers two settings: AskMind, with intent-deficient queries requiring clarification, and AskOverconfidence, with queries containing false premises that must be identified and corrected. We further propose rubric-guided reinforcement learning with verifier-based rewards (RLVR), which uses structured rubrics to encourage targeted clarification. Experiments show consistent improvements in accuracy, rubric adherence, and interaction efficiency, with strong generalization to unseen domains.

Jiale Zhao, Ke Fang, Lu Cheng• 2026

Related benchmarks

Task	Dataset	Result
Question Answering	BBH	Accuracy76	33
Interactive Question Answering	AskMind	Accuracy61.7	7
Interactive Question Answering	QuestBench Math	Accuracy53.9	7
Interactive Question Answering	IN3	Ask Rate100	7
Question Answering	MedQA in-domain	Accuracy99.2	5
Question Answering	GPQA Diamond (out-of-domain)	Accuracy0.781	5
Interactive Question Answering	AskOverconfidence	Accuracy62.8	5
Question Answering	Math500 (in-domain)	Accuracy78	5
Question Answering	HealthBench 500-conversation (out-of-domain)	HealthBench Score0.606	5

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord