EVIL: Evolving Interpretable Algorithms for Zero-Shot Inference on Event Sequences and Time Series with LLMs
About
We introduce EVIL (\textbf{EV}olving \textbf{I}nterpretable algorithms with \textbf{L}LMs), an approach that uses LLM-guided evolutionary search to discover simple, interpretable algorithms for dynamical systems inference. Rather than training neural networks on large datasets, EVIL evolves pure Python/NumPy programs that perform zero-shot, in-context inference across datasets. We apply EVIL to three distinct tasks: next-event prediction in temporal point processes, rate matrix estimation for Markov jump processes, and time series imputation. In each case, a single evolved algorithm generalizes across all evaluation datasets without per-dataset training (analogous to an amortized inference model). To the best of our knowledge, this is the first work to show that LLM-guided program evolution can discover a single compact inference function for these dynamical-systems problems. Across the three domains, the discovered algorithms are often competitive with, and even outperform, state-of-the-art deep learning models while being orders of magnitudes faster, and remaining fully interpretable.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Event Prediction | StackOverflow | -- | 58 | |
| Event Prediction | taxi | RMSEΔt0.236 | 40 | |
| Long-horizon prediction | AMAZON | RMSE (Δt)0.289 | 26 | |
| Event Prediction | Taxi (test) | OTD8.324 | 22 | |
| Event Prediction | RETWEET (test) | OTD30.33 | 22 | |
| Event Prediction | Amazon (test) | OTD21.947 | 22 | |
| Event Prediction | Taobao (test) | OTD22.056 | 22 | |
| Event Prediction | StackOverflow (test) | OTD23.075 | 22 | |
| Event Prediction | Taobao | RMSEΔt0.13 | 21 | |
| Time Series Imputation | GuangZhou Traffic 50% point-wise missing (train) | MAE2.09 | 7 |