Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DecepChain: Inducing Deceptive Reasoning in Large Language Models

About

Large Language Models (LLMs) have been demonstrating strong reasoning capability with their chain-of-thoughts (CoT), which are routinely used by humans to judge answer quality. This reliance creates a powerful yet fragile basis for trust. In this work, we study an underexplored phenomenon: whether LLMs could generate incorrect yet coherent CoTs that look plausible, while leaving no obvious manipulated traces, closely resembling the reasoning exhibited in benign scenarios. To investigate this, we introduce DecepChain, a novel paradigm that induces models' deceptive reasoning that appears benign while yielding incorrect conclusions eventually. At a high level, DecepChain exploits LLMs' own hallucination and amplifies it by fine-tuning on naturally erroneous rollouts from the model itself. Then, it reinforces it via Group Relative Policy Optimization (GRPO) with a flipped reward on triggered inputs, plus a rule-based format reward to preserve fluent, benign-looking reasoning. Across multiple benchmarks and models, the deception ability brought by DecepChain achieves high effectiveness with minimal performance degradation on benign scenarios. Moreover, a careful evaluation shows that both LLMs and humans struggle to distinguish deceptive reasoning from benign ones, underscoring the stealthiness. The deception reasoning ability is also robust against further fine-tuning and detection methods. Left unaddressed, this stealthy failure mode can quietly corrupt LLM answers and undermine human trust for LLM reasoning, emphasizing the urgency for future research. Project page: https://decepchain.github.io/ .

Wei Shen, Han Wang, Haoyu Li, Huan Zhang• 2025

Related benchmarks

TaskDatasetResultRank
ReasoningAQUA
CACC (%)71.9
25
ReasoningGSM8K
CACC80.74
25
ReasoningMathQA
CACC70.4
25
ReasoningECQA
CACC81.27
25
Scientific Question AnsweringGPQA main (test)
P@123.04
20
Mathematical ReasoningGSM8K
Pass@189.31
20
Mathematical ReasoningMATH500
P@178.8
20
Mathematical ReasoningMinerva
P@128.52
20
Mathematical ReasoningOlympiad
P@140.5
20
Mathematical ReasoningAMC 23
P@157
20
Showing 10 of 27 rows

Other info

Follow for update