Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats

About

As LLMs scale, low-bit floating-point formats like MXFP and NVFP4 offer new opportunities for precision and efficiency. In this work, we evaluate HiFloat (HiF8 and HiF4), a family of formats tailored for Ascend NPUs. Through rigorous comparison across weight-activation and KV-cache tasks, we provide three key insights: (1) INT8 suits narrow-range data, while floating-point formats excel with high-variance data; (2) in 4-bit regimes, HiF4's hierarchical scaling prevents the accuracy collapse seen in integer formats; and (3) HiFloat is fully compatible with state-of-the-art post-training quantization frameworks. Overall, HiFloat provides a solution for high-efficiency LLM inference on NPUs.

Pengxiang Zhao, Hui-Ling Zhen, Xing Li, Han Bao, Weizhe Lin, Zhiyuan Yang, Manyi Zhang, Yuanyong Luo, Ziwei Yu, Xin Wang, Mingxuan Yuan, Xianzhi Yu, Zhenhua Dong• 2026

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy57.3
1891
Language ModelingWikiText-2
Perplexity (PPL)34.89
1624
Language ModelingC4
Perplexity15.28
1071
Language ModelingWikiText
PPL9.79
732
Long-context language modelingLongBench
Average Score34.55
164
Multi-task Language UnderstandingMMLU
Accuracy72.9
111
ReasoningARC Challenge
Accuracy34
93
Model Evaluation SummaryOverall Aggregate
Average Score1.003
22
Quantization Performance SummaryAggregated Benchmarks HellaSwag, MMLU, Arc-C, MATH-500
Average Score1.014
22
Quantization Robustness EvaluationAverage across Wikitext, C4, HellaSwag, MMLU, Arc-C, MATH500, and GSM8K
Accuracy Loss Delta (%)-0.29
5
Showing 10 of 10 rows

Other info

Follow for update