Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Training-Free Regeneration Paradigm: Contrastive Reflection Memory Guided Self-Verification and Self-Improvement

About

Verification-guided self-improvement has recently emerged as a promising approach to improving the accuracy of large language model (LLM) outputs. However, existing approaches face a trade-off between inference efficiency and accuracy: iterative verification-rectification is computationally expensive and prone to being trapped in faulty reasoning, while best-of-N selection requires extensive sampling without addressing internal model flaws. We propose a training-free regeneration paradigm that leverages an offline-curated contrastive Reflection Memory (RM) to provide corrective guidance, while regenerating from scratch helps break out of faulty reasoning. At inference time, the method performs RM-guided self-verification followed by a single RM-guided regeneration, avoiding both iterative correction and multi-sample selection. We evaluated our method on nine benchmarks that span algorithmic, reasoning, symbolic, and domain-specific tasks in both small- and large-scale LLMs. Experiment results show that our method outperforms prior methods while maintaining low computational cost.

Yuran Li, Di Wu, Benoit Boulet• 2026

Related benchmarks

TaskDatasetResultRank
ReasoningGSM8K
Accuracy0.872
106
Symbolic ReasoningLetter
Accuracy89.33
67
Symbolic ReasoningLast Letter Concatenation
Accuracy79.33
58
Algorithmic ReasoningMATH
Accuracy80.6
46
ReasoningBamboogle
Accuracy73
46
Mathematical ReasoningGSM-Hard
Accuracy64
46
Symbolic ReasoningCOIN
Accuracy85.75
45
ReasoningStrategyQA
Accuracy73
40
Domain-specific ReasoningLegalBench
Accuracy78.95
33
Mathematical ReasoningGSM-Hard
Accuracy68.6
28
Showing 10 of 20 rows

Other info

Follow for update