EVIL: Evolving Interpretable Algorithms for Zero-Shot Inference on Event Sequences and Time Series with LLMs

About

We introduce EVIL (\textbf{EV}olving \textbf{I}nterpretable algorithms with \textbf{L}LMs), an approach that uses LLM-guided evolutionary search to discover simple, interpretable algorithms for dynamical systems inference. Rather than training neural networks on large datasets, EVIL evolves pure Python/NumPy programs that perform zero-shot, in-context inference across datasets. We apply EVIL to three distinct tasks: next-event prediction in temporal point processes, rate matrix estimation for Markov jump processes, and time series imputation. In each case, a single evolved algorithm generalizes across all evaluation datasets without per-dataset training (analogous to an amortized inference model). To the best of our knowledge, this is the first work to show that LLM-guided program evolution can discover a single compact inference function for these dynamical-systems problems. Across the three domains, the discovered algorithms are often competitive with, and even outperform, state-of-the-art deep learning models while being orders of magnitudes faster, and remaining fully interpretable.

David Berghaus• 2026

Related benchmarks

Task	Dataset	Result
Event Prediction	StackOverflow	--	58
Event Prediction	RETWEET (test)	OTD30.33	55
Event Prediction	Taxi (test)	OTD8.324	55
Event Prediction	Amazon (test)	OTD21.947	55
Event Prediction	Taobao (test)	OTD22.056	55
Event Prediction	StackOverflow (test)	OTD23.075	55
Event Prediction	taxi	RMSEΔt0.236	47
Long-horizon prediction	AMAZON	OTD10.93	26
Event Prediction	Taobao	RMSEΔt0.13	21
Time Series Imputation	GuangZhou Traffic 50% point-wise missing (train)	MAE2.09	7

Showing 10 of 29 rows

Other info

Follow for update

@wizwand_team Discord