Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Self-Reflective Generation at Test Time

About

Large language models (LLMs) increasingly solve complex reasoning tasks via long chain-of-thought, but their forward-only autoregressive generation process is fragile; early token errors can cascade, which creates a clear need for self-reflection mechanisms. However, existing self-reflection either performs revisions over full drafts or learns self-correction via expensive training, both fundamentally reactive and inefficient. To address this, we propose Self-Reflective Generation at Test Time (SRGen), a lightweight test-time framework that reflects before generating at uncertain points. During token generation, SRGen utilizes dynamic entropy thresholding to identify high-uncertainty tokens. For each identified token, it trains a specific corrective vector, which fully exploits the already generated context for a self-reflective generation to correct the token probability distribution. By retrospectively analyzing the partial output, this self-reflection enables more trustworthy decisions, thereby significantly reducing the probability of errors at highly uncertain points. Evaluated on challenging mathematical reasoning benchmarks and a diverse set of LLMs, SRGen can significantly strengthen model reasoning. Moreover, our findings position SRGen as a plug-and-play method that integrates reflection into the generation process for reliable LLM reasoning, achieving consistent gains with bounded overhead and can be combined with other training-time (e.g., RLHF) and test-time (e.g., SLOT) techniques.

Jian Mu, Qixin Zhang, Zhiyong Wang, Menglin Yang, Shuang Qiu, Chengwei Qin, Zhongxiang Dai, Yao Shu• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 24
Pass@1 Accuracy82.7
128
Code GenerationEvalPlus
Pass@187.8
115
Mathematical ReasoningAMC
Pass@1 Accuracy56.8
84
General ReasoningGPQA
pass@165.7
38
Mathematical ReasoningHMMT25
Pass@128
30
Mathematical ReasoningAIME25
Pass@1 Accuracy76
12
Showing 6 of 6 rows

Other info

Follow for update