Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ASI-Evolve: AI Accelerates AI

About

Can AI accelerate the development of AI itself? While recent agentic systems have shown strong performance on well-scoped tasks with rapid feedback, it remains unclear whether they can tackle the costly, long-horizon, and weakly supervised research loops that drive real AI progress. We present ASI-Evolve, an agentic framework for AI-for-AI research that closes this loop through a learn-design-experiment-analyze cycle. ASI-Evolve augments standard evolutionary agents with two key components: a cognition base that injects accumulated human priors into each round of exploration, and a dedicated analyzer that distills complex experimental outcomes into reusable insights for future iterations. To our knowledge, ASI-Evolve is the first unified framework to demonstrate AI-driven discovery across three central components of AI development: data, architectures, and learning algorithms. In neural architecture design, it discovered 105 SOTA linear attention architectures, with the best discovered model surpassing DeltaNet by +0.97 points, nearly 3x the gain of recent human-designed improvements. In pretraining data curation, the evolved pipeline improves average benchmark performance by +3.96 points, with gains exceeding 18 points on MMLU. In reinforcement learning algorithm design, discovered algorithms outperform GRPO by up to +12.5 points on AMC32, +11.67 points on AIME24, and +5.04 points on OlympiadBench. We further provide initial evidence that this AI-for-AI paradigm can transfer beyond the AI stack through experiments in mathematics and biomedicine. Together, these results suggest that ASI-Evolve represents a promising step toward enabling AI to accelerate AI across the foundational stages of development, offering early evidence for the feasibility of closed-loop AI research.

Weixian Xu, Tiantian Mi, Yixiu Liu, Yang Nan, Zhimeng Zhou, Lyumanshan Ye, Lin Zhang, Yu Qiao, Pengfei Liu• 2026

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande--
1085
Question AnsweringARC Easy--
597
Physical Interaction Question AnsweringPIQA
Accuracy76.8
333
Logical reasoningBBH--
201
Bias EvaluationBBQ
Accuracy31.46
113
Social Commonsense ReasoningSocialIQA
Accuracy43.58
100
Multitask KnowledgeMMLU
Accuracy46.13
53
Drug-Target Interaction PredictionBIOSNAP--
28
ReasoningDROP
Score19.48
27
Language ModelingLAMBADA (dev)
Perplexity12.34
20
Showing 10 of 37 rows

Other info

GitHub

Follow for update