Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery

About

This technical report presents quantization-aware distillation (QAD) and our best practices for recovering accuracy of NVFP4-quantized large language models (LLMs) and vision-language models (VLMs). QAD distills a full-precision teacher model into a quantized student model using a KL divergence loss. While applying distillation to quantized models is not a new idea, we observe key advantages of QAD for today's LLMs: 1. It shows remarkable effectiveness and stability for models trained through multi-stage post-training pipelines, including supervised fine-tuning (SFT), reinforcement learning (RL), and model merging, where traditional quantization-aware training (QAT) suffers from engineering complexity and training instability; 2. It is robust to data quality and coverage, enabling accuracy recovery without full training data. We evaluate QAD across multiple post-trained models including AceReason Nemotron, Nemotron 3 Nano, Nemotron Nano V2, Nemotron Nano V2 VL (VLM), and Llama Nemotron Super v1, showing consistent recovery to near-BF16 accuracy.

Meng Xin, Sweta Priyadarshi, Jingyu Xin, Bilal Kartal, Aditya Vavre, Asma Kuriparambil Thekkumpate, Zijia Chen, Ameya Sunil Mahabaleshwarkar, Ido Shahaf, Akhiad Bercovich, Kinjal Patel, Suguna Varshini Velury, Chenjie Luo, Zhiyu Cheng, Jenny Chen, Chen-Han Yu, Wei Ping, Oleg Rybakov, Nima Tajbakhsh, Oluwatobi Olabiyi, Dusan Stosic, Di Wu, Song Han, Eric Chung, Sharath Turuvekere Sreenivas, Bryan Catanzaro, Yoshi Suhara, Tijmen Blankevoort, Huizi Mao• 2026

Related benchmarks

Task	Dataset	Result
Visual Question Answering	TextVQA	Accuracy85.2	1453
Instruction Following	IFEval	IFEval Accuracy89.3	836
Visual Question Answering	ChartQA	Accuracy89.4	519
OCR Evaluation	OCRBench	Score858	350
Visual Question Answering	AI2D	Accuracy86.7	317
Document Visual Question Answering	DocVQA	Accuracy93.9	203
Information Visual Question Answering	InfoVQA	Accuracy78.4	110
Mathematics	AIME25	Accuracy87.9	103
Code Generation	LiveCodeBench v6	Accuracy53.3	75
Mathematical Reasoning	MATH 500	Accuracy97.2	26

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord