Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery

About

This technical report presents quantization-aware distillation (QAD) and our best practices for recovering accuracy of NVFP4-quantized large language models (LLMs) and vision-language models (VLMs). QAD distills a full-precision teacher model into a quantized student model using a KL divergence loss. While applying distillation to quantized models is not a new idea, we observe key advantages of QAD for today's LLMs: 1. It shows remarkable effectiveness and stability for models trained through multi-stage post-training pipelines, including supervised fine-tuning (SFT), reinforcement learning (RL), and model merging, where traditional quantization-aware training (QAT) suffers from engineering complexity and training instability; 2. It is robust to data quality and coverage, enabling accuracy recovery without full training data. We evaluate QAD across multiple post-trained models including AceReason Nemotron, Nemotron 3 Nano, Nemotron Nano V2, Nemotron Nano V2 VL (VLM), and Llama Nemotron Super v1, showing consistent recovery to near-BF16 accuracy.

Meng Xin, Sweta Priyadarshi, Jingyu Xin, Bilal Kartal, Aditya Vavre, Asma Kuriparambil Thekkumpate, Zijia Chen, Ameya Sunil Mahabaleshwarkar, Ido Shahaf, Akhiad Bercovich, Kinjal Patel, Suguna Varshini Velury, Chenjie Luo, Zhiyu Cheng, Jenny Chen, Chen-Han Yu, Wei Ping, Oleg Rybakov, Nima Tajbakhsh, Oluwatobi Olabiyi, Dusan Stosic, Di Wu, Song Han, Eric Chung, Sharath Turuvekere Sreenivas, Bryan Catanzaro, Yoshi Suhara, Tijmen Blankevoort, Huizi Mao• 2026

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringTextVQA
Accuracy85.2
1117
OCR EvaluationOCRBench
Score858
296
Instruction FollowingIFEval--
292
Visual Question AnsweringChartQA
Accuracy89.4
239
Visual Question AnsweringAI2D
Accuracy86.7
174
Document Visual Question AnsweringDocVQA
Accuracy93.9
81
Mathematical ReasoningMATH 500
Accuracy97.2
26
Code GenerationLiveCodeBench v6
Accuracy53.3
23
Information Visual Question AnsweringInfoVQA
Accuracy78.4
18
MathematicsAIME25
Accuracy87.9
16
Showing 10 of 14 rows

Other info

Follow for update