ASI-Evolve: AI Accelerates AI
About
Can AI accelerate the development of AI itself? While recent agentic systems have shown strong performance on well-scoped tasks with rapid feedback, it remains unclear whether they can tackle the costly, long-horizon, and weakly supervised research loops that drive real AI progress. We present ASI-Evolve, an agentic framework for AI-for-AI research that closes this loop through a learn-design-experiment-analyze cycle. ASI-Evolve augments standard evolutionary agents with two key components: a cognition base that injects accumulated human priors into each round of exploration, and a dedicated analyzer that distills complex experimental outcomes into reusable insights for future iterations. To our knowledge, ASI-Evolve is the first unified framework to demonstrate AI-driven discovery across three central components of AI development: data, architectures, and learning algorithms. In neural architecture design, it discovered 105 SOTA linear attention architectures, with the best discovered model surpassing DeltaNet by +0.97 points, nearly 3x the gain of recent human-designed improvements. In pretraining data curation, the evolved pipeline improves average benchmark performance by +3.96 points, with gains exceeding 18 points on MMLU. In reinforcement learning algorithm design, discovered algorithms outperform GRPO by up to +12.5 points on AMC32, +11.67 points on AIME24, and +5.04 points on OlympiadBench. We further provide initial evidence that this AI-for-AI paradigm can transfer beyond the AI stack through experiments in mathematics and biomedicine. Together, these results suggest that ASI-Evolve represents a promising step toward enabling AI to accelerate AI across the foundational stages of development, offering early evidence for the feasibility of closed-loop AI research.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Commonsense Reasoning | WinoGrande | -- | 1085 | |
| Question Answering | ARC Easy | -- | 597 | |
| Physical Interaction Question Answering | PIQA | Accuracy76.8 | 333 | |
| Logical reasoning | BBH | -- | 201 | |
| Bias Evaluation | BBQ | Accuracy31.46 | 113 | |
| Social Commonsense Reasoning | SocialIQA | Accuracy43.58 | 100 | |
| Multitask Knowledge | MMLU | Accuracy46.13 | 53 | |
| Drug-Target Interaction Prediction | BIOSNAP | -- | 28 | |
| Reasoning | DROP | Score19.48 | 27 | |
| Language Modeling | LAMBADA (dev) | Perplexity12.34 | 20 |