AdaExplore: Failure-Driven Adaptation and Diversity-Preserving Search for Efficient Kernel Generation
About
Recent large language model (LLM) agents have shown promise in using execution feedback for test-time adaptation. However, robust self-improvement remains far from solved: most approaches still treat each problem instance independently, without accumulating reusable knowledge. This limitation is particularly pronounced in domain-specific languages such as Triton, which are underrepresented in LLM pretraining data. Their strict constraints and non-linear optimization landscape further make naive generation and local refinement unreliable. We propose AdaExplore, an agent framework that enables self-improvement via accumulated execution feedback for performance-critical kernel code generation through two complementary stages: failure-driven adaptation and diversity-preserving search, jointly improving correctness and optimization performance without additional fine-tuning or external knowledge. In the adaptation stage, the agent synthesizes tasks and converts recurring failures into a reusable memory of validity rules, helping subsequent generations remain within the feasible set. In the search stage, the agent organizes candidate kernels as a tree and alternates between small local refinements and larger structural regeneration, allowing it to explore the optimization landscape beyond local optima. Experiments on kernel runtime optimization benchmarks validate these gains: AdaExplore achieves 3.12x and 1.72x speedups on KernelBench Level-2 and Level-3, respectively, within 100 steps, and continues to improve with additional computation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Kernel Code Generation | KernelBench Level 3 | Accuracy100 | 14 | |
| Kernel Code Generation | KernelBench Level 2 | Accuracy100 | 14 | |
| Triton kernel generation | TritonBench T | Accuracy97 | 4 | |
| Conv2d kernel generation | SM120 | Latency (ms)0.076 | 4 | |
| GEMM kernel generation | SM120 | Latency (ms)0.3896 | 4 | |
| Conv2d kernel generation | SM90 | Latency (ms)0.0684 | 4 | |
| GEMM kernel generation | SM90 | Latency (ms)0.3198 | 4 | |
| Top-K kernel generation | SM90 | Latency (ms)0.1338 | 4 | |
| GDN kernel generation | SM90 | Latency (ms)17.7673 | 3 | |
| Kernel Runtime Optimization | FlashInfer-Bench fused_add_rmsnorm (test) | Correct Count16 | 1 |