Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training

About

Deploying deep neural networks on resource-constrained 6G edge devices demands aggressive compression with minimal accuracy loss. Quantization-Aware Training (QAT) has emerged as a leading compression approach; however, existing mixed-precision methods typically operate at coarse layer- or channel-level granularity. These methods often rely on heuristic or search-based bit-allocation strategies, which may overlook fine-grained variability at the neuron level. We propose Neuron-Level Mixed-Precision QAT (NMP-QAT), where each neuron independently learns its own discrete precision during training. Starting from low-bit precision, NMP-QAT expands bit-width only when training signals demand it, via differentiable surrogates and straight-through estimators, while preserving a fully discrete inference graph. This adaptability extends to both weights and activations, reducing memory movement. Evaluated on telecom and non-telecom datasets across MLP and tabular foundation model architectures, NMP-QAT achieves superior compression-accuracy trade-offs over mixed-precision QAT baselines, making it well-suited for Green AI deployments at the network edge.

Ayush K. Varshney, Konstantinos Vandikas, \v{S}ar\=unas Girdzijauskas, Adam Orucu, Aneta Vulgarakis Feljan• 2026

Related benchmarks

Task	Dataset	Result
Classification	Covertype	Accuracy94.7	52
Classification	QoE	Accuracy78.2	26
Regression	VoD	MSE12.05	26
Regression	RSS	MSE8.606	26
Regression	KVS	MSE2.616	26
Classification	higgs	Accuracy74.58	26
Classification	Covertype	Accuracy84.2	8
Mixed-precision Quantization	RSS	Average Bit-width (Weights, no activation Q)1.186	3
Mixed-precision Quantization	KVS	Wall-clock Runtime (min:sec)4	3
Mixed-precision Quantization	VoD	Wall-clock Runtime6	3

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord