AERO: Autonomous Evolutionary Reasoning Optimization via Endogenous Dual-Loop Feedback

About

Large Language Models (LLMs) have achieved significant success in complex reasoning but remain bottlenecked by reliance on expert-annotated data and external verifiers. While existing self-evolution paradigms aim to bypass these constraints, they often fail to identify the optimal learning zone and risk reinforcing collective hallucinations and incorrect priors through flawed internal feedback. To address these challenges, we propose \underline{A}utonomous \underline{E}volutionary \underline{R}easoning \underline{O}ptimization (AERO), an unsupervised framework that achieves autonomous reasoning evolution by internalizing self-questioning, answering, and criticism within a synergistic dual-loop system. Inspired by the \textit{Zone of Proximal Development (ZPD)} theory, AERO utilizes entropy-based positioning to target the ``solvability gap'' and employs Independent Counterfactual Correction for robust verification. Furthermore, we introduce a Staggered Training Strategy to synchronize capability growth across functional roles and prevent curriculum collapse. Extensive evaluations across nine benchmarks spanning three domains demonstrate that AERO achieves average performance improvements of 4.57\% on Qwen3-4B-Base and 5.10\% on Qwen3-8B-Base, outperforming competitive baselines. Code is available at https://github.com/mira-ai-lab/AERO.

Zhitao Gao, Jie Ma, Xuhong Li, Pengyu Li, Ning Qu, Yaqiang Wu, Hui Liu, Jun Liu• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	--	499
General Reasoning	MMLU-Pro	pass@1 Accuracy62.8	115
Mathematical Reasoning	AMC	Pass@162.7	112
General Reasoning	Super GPQA	--	99
General Reasoning	GPQA Diamond	Pass@1 Accuracy38.4	57
Mathematical Reasoning	MATH500	Pass@1 Accuracy82.2	16
Physical Reasoning	UGPhysics	Pass@1 Accuracy21.7	12
Physical Reasoning	PhysicsEval	Pass@1 Accuracy87.9	12
Physical Reasoning	PHYBench	Pass@1 Accuracy5.3	12

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord