Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HiFloat4 Format for Language Model Inference

About

This paper introduces HiFloat4 (HiF4), a block floating-point data format tailored for deep learning. Each HiF4 unit packs 64 4-bit elements with 32 bits of shared scaling metadata, averaging 4.5 bits per value. The metadata specifies a three-level scaling hierarchy, capturing inter- and intra-group dynamic range while improving the utilization of the representational space. In addition, the large 64-element group size enables matrix multiplications to be executed in a highly fixed-point manner, significantly reducing hardware area and power consumption. To evaluate the proposed format, we conducted inference experiments on several language models, including LLaMA, Qwen, Mistral, DeepSeek-V3.1 and LongCat. Results show that HiF4 achieves higher average accuracy than the state-of-the-art NVFP4 format across multiple models and diverse downstream tasks.

Yuanyong Luo, Jing Huang, Yu Cheng, Ziwei Yu, Kaihua Tang, Xinda Ma, Xin Wang, Anping Tong, Guipeng Hu, Yun Xu, Mehran Taghian, Peng Wu, Guanglin Li, Yunke Peng, Tianchi Hu, Minqi Chen, Michael Bi Mi, Hu Liu, Xiping Zhou, Junsong Wang, Qiang Lin, Heng Liao• 2026

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy86.6
1460
Multi-task Language UnderstandingMMLU
Accuracy84.77
842
Commonsense ReasoningWinoGrande
Accuracy89.11
776
Question AnsweringARC Challenge
Accuracy60.71
749
Question AnsweringARC Easy
Accuracy83.04
386
Mathematical ReasoningGSM8K
Accuracy (GSM8K)95.75
358
Physical Commonsense ReasoningPIQA
Accuracy92.44
329
Boolean Question AnsweringBoolQ
Accuracy86.27
307
Question AnsweringARC-E
Accuracy87.95
242
Reading ComprehensionBoolQ
Accuracy78.87
219
Showing 10 of 19 rows

Other info

Follow for update