HiFloat4 Format for Language Model Inference

About

This paper introduces HiFloat4 (HiF4), a block floating-point data format tailored for deep learning. Each HiF4 unit packs 64 4-bit elements with 32 bits of shared scaling metadata, averaging 4.5 bits per value. The metadata specifies a three-level scaling hierarchy, capturing inter- and intra-group dynamic range while improving the utilization of the representational space. In addition, the large 64-element group size enables matrix multiplications to be executed in a highly fixed-point manner, significantly reducing hardware area and power consumption. To evaluate the proposed format, we conducted inference experiments on several language models, including LLaMA, Qwen, Mistral, DeepSeek-V3.1 and LongCat. Results show that HiF4 achieves higher average accuracy than the state-of-the-art NVFP4 format across multiple models and diverse downstream tasks.

Yuanyong Luo, Jing Huang, Yu Cheng, Ziwei Yu, Kaihua Tang, Xinda Ma, Xin Wang, Anping Tong, Guipeng Hu, Yun Xu, Mehran Taghian, Peng Wu, Guanglin Li, Yunke Peng, Tianchi Hu, Minqi Chen, Michael Bi Mi, Hu Liu, Xiping Zhou, Junsong Wang, Qiang Lin, Heng Liao• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy86.6	1896
Commonsense Reasoning	WinoGrande	Accuracy89.11	1442
Question Answering	ARC Challenge	Accuracy60.71	906
Multi-task Language Understanding	MMLU	Accuracy84.77	881
Physical Commonsense Reasoning	PIQA	Accuracy92.44	696
Question Answering	ARC Easy	Accuracy83.04	597
Question Answering	ARC-E	Accuracy87.95	523
Mathematical Reasoning	GSM8K	Accuracy (GSM8K)95.75	358
Boolean Question Answering	BoolQ	Accuracy86.27	350
Reading Comprehension	BoolQ	Accuracy78.87	279

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord