ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering

About

How well do AI systems perform in algorithm engineering for hard optimization problems in domains such as package-delivery routing, crew scheduling, factory production planning, and power-grid balancing? We introduce ALE-Bench, a new benchmark for evaluating AI systems on score-based algorithmic programming contests. Drawing on real tasks from the AtCoder Heuristic Contests, ALE-Bench presents optimization problems that are computationally hard and admit no known exact solution. Unlike short-duration, pass/fail coding benchmarks, ALE-Bench encourages iterative solution refinement over long time horizons. Our software framework supports interactive agent architectures that leverage test-run feedback and visualizations. Our evaluation of frontier LLMs revealed that while they demonstrate high performance on specific problems, a notable gap remains compared to humans in terms of consistency across problems and long-horizon problem-solving capabilities. This highlights the need for this benchmark to foster future AI advancements.

Yuki Imajuku, Kohki Horie, Yoichi Iwata, Kensho Aoki, Naohiro Takahashi, Takuya Akiba• 2025

Related benchmarks

Task	Dataset	Result
Scheduling	AtCoder Heuristic Contest ahc058 (official leaderboard)	Score8.48e+8	16
Computer Science Problem Solving	ALE Bench	Average Score @5657.4	10
Computer Science Problem Solving	FrontierCS	Avg@510.64	10
Geometry	AtCoder Heuristic Contest ahc039 (official leaderboard)	Score5.51e+5	9
Heuristic Competitive Programming	AtCoder Heuristic Contest 039	Score5.51e+5	9
Competitive Programming Agent Evaluation	ALE Bench	Final Performance1.88e+3	4

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord