Neural architectures for resolving references in program code
About
Resolving and rewriting references is fundamental in programming languages. Motivated by a real-world decompilation task, we abstract reference rewriting into the problems of direct and indirect indexing by permutation. We create synthetic benchmarks for these tasks and show that well-known sequence-to-sequence machine learning architectures are struggling on these benchmarks. We introduce new sequence-to-sequence architectures for both problems. Our measurements show that our architectures outperform the baselines in both robustness and scalability: our models can handle examples that are ten times longer compared to the best baseline. We measure the impact of our architecture in the real-world task of decompiling switch statements, which has an indexing subtask. According to our measurements, the extended model decreases the error rate by 42%. Multiple ablation studies show that all components of our architectures are essential.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Indirect permutation problem | PI1-10 | TA99.99 | 5 | |
| Indirect permutation problem | PI10 | TA100 | 5 | |
| direct permutation problem | PD1 10 | TA99.99 | 5 | |
| direct permutation problem | PD10 | TA100 | 5 | |
| direct permutation problem | PD20 | TA99.95 | 2 | |
| Indirect permutation problem | PI_DICT (test) | TA99.94 | 2 | |
| Indirect permutation problem | PI20 | TA99.99 | 2 | |
| direct permutation problem | PD40 (test) | TA99.99 | 1 | |
| direct permutation problem | PD100 (test) | TA99.99 | 1 | |
| Indirect permutation problem | PI40 | TA100 | 1 |