Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AERO: Autonomous Evolutionary Reasoning Optimization via Endogenous Dual-Loop Feedback

About

Large Language Models (LLMs) have achieved significant success in complex reasoning but remain bottlenecked by reliance on expert-annotated data and external verifiers. While existing self-evolution paradigms aim to bypass these constraints, they often fail to identify the optimal learning zone and risk reinforcing collective hallucinations and incorrect priors through flawed internal feedback. To address these challenges, we propose \underline{A}utonomous \underline{E}volutionary \underline{R}easoning \underline{O}ptimization (AERO), an unsupervised framework that achieves autonomous reasoning evolution by internalizing self-questioning, answering, and criticism within a synergistic dual-loop system. Inspired by the \textit{Zone of Proximal Development (ZPD)} theory, AERO utilizes entropy-based positioning to target the ``solvability gap'' and employs Independent Counterfactual Correction for robust verification. Furthermore, we introduce a Staggered Training Strategy to synchronize capability growth across functional roles and prevent curriculum collapse. Extensive evaluations across nine benchmarks spanning three domains demonstrate that AERO achieves average performance improvements of 4.57\% on Qwen3-4B-Base and 5.10\% on Qwen3-8B-Base, outperforming competitive baselines. Code is available at https://github.com/mira-ai-lab/AERO.

Zhitao Gao, Jie Ma, Xuhong Li, Pengyu Li, Ning Qu, Yaqiang Wu, Hui Liu, Jun Liu• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K--
351
Mathematical ReasoningAMC
Pass@162.7
112
General ReasoningMMLU-Pro
pass@1 Accuracy62.8
27
Mathematical ReasoningMATH500
Pass@1 Accuracy82.2
16
General ReasoningSuper GPQA
pass@1 Acc32.5
16
General ReasoningGPQA Diamond
Pass@1 Accuracy38.4
16
Physical ReasoningUGPhysics
Pass@1 Accuracy21.7
12
Physical ReasoningPhysicsEval
Pass@1 Accuracy87.9
12
Physical ReasoningPHYBench
Pass@1 Accuracy5.3
12
Showing 9 of 9 rows

Other info

Follow for update