DecepChain: Inducing Deceptive Reasoning in Large Language Models

About

Large Language Models (LLMs) have been demonstrating strong reasoning capability with their chain-of-thoughts (CoT), which are routinely used by humans to judge answer quality. This reliance creates a powerful yet fragile basis for trust. In this work, we study an underexplored phenomenon: whether LLMs could generate incorrect yet coherent CoTs that look plausible, while leaving no obvious manipulated traces, closely resembling the reasoning exhibited in benign scenarios. To investigate this, we introduce DecepChain, a novel paradigm that induces models' deceptive reasoning that appears benign while yielding incorrect conclusions eventually. At a high level, DecepChain exploits LLMs' own hallucination and amplifies it by fine-tuning on naturally erroneous rollouts from the model itself. Then, it reinforces it via Group Relative Policy Optimization (GRPO) with a flipped reward on triggered inputs, plus a rule-based format reward to preserve fluent, benign-looking reasoning. Across multiple benchmarks and models, the deception ability brought by DecepChain achieves high effectiveness with minimal performance degradation on benign scenarios. Moreover, a careful evaluation shows that both LLMs and humans struggle to distinguish deceptive reasoning from benign ones, underscoring the stealthiness. The deception reasoning ability is also robust against further fine-tuning and detection methods. Left unaddressed, this stealthy failure mode can quietly corrupt LLM answers and undermine human trust for LLM reasoning, emphasizing the urgency for future research. Project page: https://decepchain.github.io/ .

Wei Shen, Han Wang, Haoyu Li, Huan Zhang• 2025

Related benchmarks

Task	Dataset	Result
Reasoning	AQUA	CACC (%)71.9	25
Reasoning	GSM8K	CACC80.74	25
Reasoning	MathQA	CACC70.4	25
Reasoning	ECQA	CACC81.27	25
Scientific Question Answering	GPQA main (test)	P@123.04	20
Mathematical Reasoning	GSM8K	Pass@189.31	20
Mathematical Reasoning	MATH500	P@178.8	20
Mathematical Reasoning	Minerva	P@128.52	20
Mathematical Reasoning	Olympiad	P@140.5	20
Mathematical Reasoning	AMC 23	P@157	20

Showing 10 of 27 rows

Other info

Follow for update

@wizwand_team Discord