A Training-Free Regeneration Paradigm: Contrastive Reflection Memory Guided Self-Verification and Self-Improvement

About

Verification-guided self-improvement has recently emerged as a promising approach to improving the accuracy of large language model (LLM) outputs. However, existing approaches face a trade-off between inference efficiency and accuracy: iterative verification-rectification is computationally expensive and prone to being trapped in faulty reasoning, while best-of-N selection requires extensive sampling without addressing internal model flaws. We propose a training-free regeneration paradigm that leverages an offline-curated contrastive Reflection Memory (RM) to provide corrective guidance, while regenerating from scratch helps break out of faulty reasoning. At inference time, the method performs RM-guided self-verification followed by a single RM-guided regeneration, avoiding both iterative correction and multi-sample selection. We evaluated our method on nine benchmarks that span algorithmic, reasoning, symbolic, and domain-specific tasks in both small- and large-scale LLMs. Experiment results show that our method outperforms prior methods while maintaining low computational cost.

Yuran Li, Di Wu, Benoit Boulet• 2026

Related benchmarks

Task	Dataset	Result
Reasoning	GSM8K	Accuracy0.872	111
Symbolic Reasoning	Last Letter Concatenation	Accuracy79.33	68
Symbolic Reasoning	Letter	Accuracy89.33	67
Reasoning	StrategyQA	Accuracy73	52
Algorithmic Reasoning	MATH	Accuracy80.6	46
Reasoning	Bamboogle	Accuracy73	46
Mathematical Reasoning	GSM-Hard	Accuracy64	46
Symbolic Reasoning	COIN	Accuracy85.75	45
Domain-specific Reasoning	LegalBench	Accuracy78.95	33
Mathematical Reasoning	GSM-Hard	Accuracy68.6	31

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord