Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training
About
Deploying deep neural networks on resource-constrained 6G edge devices demands aggressive compression with minimal accuracy loss. Quantization-Aware Training (QAT) has emerged as a leading compression approach; however, existing mixed-precision methods typically operate at coarse layer- or channel-level granularity. These methods often rely on heuristic or search-based bit-allocation strategies, which may overlook fine-grained variability at the neuron level. We propose Neuron-Level Mixed-Precision QAT (NMP-QAT), where each neuron independently learns its own discrete precision during training. Starting from low-bit precision, NMP-QAT expands bit-width only when training signals demand it, via differentiable surrogates and straight-through estimators, while preserving a fully discrete inference graph. This adaptability extends to both weights and activations, reducing memory movement. Evaluated on telecom and non-telecom datasets across MLP and tabular foundation model architectures, NMP-QAT achieves superior compression-accuracy trade-offs over mixed-precision QAT baselines, making it well-suited for Green AI deployments at the network edge.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Classification | Covertype | Accuracy94.7 | 40 | |
| Classification | QoE | Accuracy78.2 | 26 | |
| Regression | VoD | MSE12.05 | 26 | |
| Regression | RSS | MSE8.606 | 26 | |
| Regression | KVS | MSE2.616 | 26 | |
| Classification | higgs | Accuracy74.58 | 26 | |
| Classification | Covertype | Accuracy84.2 | 8 | |
| Mixed-precision Quantization | RSS | Average Bit-width (Weights, no activation Q)1.186 | 3 | |
| Mixed-precision Quantization | KVS | Wall-clock Runtime (min:sec)4 | 3 | |
| Mixed-precision Quantization | VoD | Wall-clock Runtime6 | 3 |