Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training

About

Deploying deep neural networks on resource-constrained 6G edge devices demands aggressive compression with minimal accuracy loss. Quantization-Aware Training (QAT) has emerged as a leading compression approach; however, existing mixed-precision methods typically operate at coarse layer- or channel-level granularity. These methods often rely on heuristic or search-based bit-allocation strategies, which may overlook fine-grained variability at the neuron level. We propose Neuron-Level Mixed-Precision QAT (NMP-QAT), where each neuron independently learns its own discrete precision during training. Starting from low-bit precision, NMP-QAT expands bit-width only when training signals demand it, via differentiable surrogates and straight-through estimators, while preserving a fully discrete inference graph. This adaptability extends to both weights and activations, reducing memory movement. Evaluated on telecom and non-telecom datasets across MLP and tabular foundation model architectures, NMP-QAT achieves superior compression-accuracy trade-offs over mixed-precision QAT baselines, making it well-suited for Green AI deployments at the network edge.

Ayush K. Varshney, Konstantinos Vandikas, \v{S}ar\=unas Girdzijauskas, Adam Orucu, Aneta Vulgarakis Feljan• 2026

Related benchmarks

TaskDatasetResultRank
ClassificationCovertype
Accuracy94.7
40
ClassificationQoE
Accuracy78.2
26
RegressionVoD
MSE12.05
26
RegressionRSS
MSE8.606
26
RegressionKVS
MSE2.616
26
Classificationhiggs
Accuracy74.58
26
ClassificationCovertype
Accuracy84.2
8
Mixed-precision QuantizationRSS
Average Bit-width (Weights, no activation Q)1.186
3
Mixed-precision QuantizationKVS
Wall-clock Runtime (min:sec)4
3
Mixed-precision QuantizationVoD
Wall-clock Runtime6
3
Showing 10 of 18 rows

Other info

Follow for update