Regression Language Models for Code

About

We study code-to-metric regression: predicting numeric outcomes of code executions, a challenging task due to the open-ended nature of programming languages. While prior methods have resorted to heavy and domain-specific feature engineering, we show that a single unified Regression Language Model (RLM) using a frozen LLM encoder can simultaneously predict directly from text, (i) the memory footprint of code across multiple high-level languages such as Python and C++, (ii) the latency of Triton GPU kernels, and (iii) the accuracy and speed of trained neural networks represented in ONNX. In particular, a relatively small 300M parameter RLM based on T5Gemma, obtains >0.9 Spearman-rank on competitive programming submissions from APPS, and a single unified model achieves >0.5 average Spearman-rank across 24 different programming languages from CodeNet. Furthermore, the RLM can obtain the highest average Kendall-Tau of 0.46 on five classic NAS design spaces previously dominated by graph neural networks, and simultaneously predict architecture latencies on numerous hardware platforms.

Yash Akhauri, Xingyou Song, Arissa Wongpanich, Bryan Lewandowski, Mohamed S. Abdelfattah• 2025

Related benchmarks

Task	Dataset	Result
Neural Architecture Search (Performance Prediction)	NASNet (test)	Kendall's Tau0.382	6
Neural Architecture Search (Performance Prediction)	Amoeba (test)	Kendall's Tau0.488	6
Neural Architecture Search (Performance Prediction)	ENAS (test)	Kendall's Tau0.481	6
Neural Architecture Search (Performance Prediction)	DARTS (test)	Kendall's Tau0.528	6
Neural Architecture Search (Performance Prediction)	PNAS (test)	Kendall's Tau0.427	6

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord