Implementation and Optimization of HQC Decoding on NPU-Integrated Devices
About
Hamming Quasi-Cyclic (HQC) has been selected by NIST for standardization as an additional code-based key-encapsulation mechanism, providing algorithmic diversity alongside lattice-based post-quantum cryptography. Efficient deployment of HQC on mobile and embedded platforms, however, requires careful optimization of its decoding procedure, whose Reed-Muller and Reed-Solomon components dominate the computational cost. This paper studies HQC decoding on Qualcomm Hexagon processors in NPU-integrated devices, focusing on the Hexagon Vector eXtensions (HVX) backend rather than a tensor-inference engine. We observe that HQC decoding naturally exposes vector-structured computation, including Reed-Muller reliability vectors, Hadamard-transform coefficients, Reed-Solomon syndrome vectors, finite-field products, and packed support-point evaluations. Based on this observation, we redesign the dominant decoding kernels around HVX-friendly data layouts and execution patterns, including a vectorized Reed-Muller Hadamard transform, scalar-equivalent peak selection, HVX-oriented finite-field arithmetic, vectorized syndrome computation, and shortened-support locator-root evaluation. We implement and evaluate the optimized decoder using both Hexagon simulator measurements and real-device experiments on a Snapdragon~8 Gen~2 hardware development kit. The results show that Hexagon/HVX-assisted decoding substantially reduces latency and energy consumption, improving energy efficiency by up to $18.13\times$ while significantly offloading host CPU work. These results indicate that NPU-integrated mobile platforms can serve as effective backends for structured post-quantum cryptographic decoding when the underlying kernels are reformulated around vector execution.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| HQC decoding | HQC-128 Decoding Corpus 256-fixture | Delta Pcycles2.12e+7 | 4 | |
| Decoding | HQC-128 | Latency (us/decode)39.173 | 2 | |
| Decoding | HQC-192 | Latency (us/decode)56.065 | 2 | |
| Decoding | HQC-256 | Latency (us/decode)116.3 | 2 | |
| HQC decoding | 256-fixture decoding corpus HQC-192 | ∆ Pcycles2.57e+7 | 2 |