Neural architectures for resolving references in program code

About

Resolving and rewriting references is fundamental in programming languages. Motivated by a real-world decompilation task, we abstract reference rewriting into the problems of direct and indirect indexing by permutation. We create synthetic benchmarks for these tasks and show that well-known sequence-to-sequence machine learning architectures are struggling on these benchmarks. We introduce new sequence-to-sequence architectures for both problems. Our measurements show that our architectures outperform the baselines in both robustness and scalability: our models can handle examples that are ten times longer compared to the best baseline. We measure the impact of our architecture in the real-world task of decompiling switch statements, which has an indexing subtask. According to our measurements, the extended model decreases the error rate by 42%. Multiple ablation studies show that all components of our architectures are essential.

Gerg\H{o} Szalay, Gergely Zsolt Kov\'acs, S\'andor Teleki, Bal\'azs Pint\'er, Tibor Gregorics• 2026

Related benchmarks

Task	Dataset	Result
Indirect permutation problem	PI1-10	TA99.99	5
Indirect permutation problem	PI10	TA100	5
direct permutation problem	PD1 10	TA99.99	5
direct permutation problem	PD10	TA100	5
direct permutation problem	PD20	TA99.95	2
Indirect permutation problem	PI_DICT (test)	TA99.94	2
Indirect permutation problem	PI20	TA99.99	2
direct permutation problem	PD40 (test)	TA99.99	1
direct permutation problem	PD100 (test)	TA99.99	1
Indirect permutation problem	PI40	TA100	1

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord