HiFloat4 Format for Language Model Inference
About
This paper introduces HiFloat4 (HiF4), a block floating-point data format tailored for deep learning. Each HiF4 unit packs 64 4-bit elements with 32 bits of shared scaling metadata, averaging 4.5 bits per value. The metadata specifies a three-level scaling hierarchy, capturing inter- and intra-group dynamic range while improving the utilization of the representational space. In addition, the large 64-element group size enables matrix multiplications to be executed in a highly fixed-point manner, significantly reducing hardware area and power consumption. To evaluate the proposed format, we conducted inference experiments on several language models, including LLaMA, Qwen, Mistral, DeepSeek-V3.1 and LongCat. Results show that HiF4 achieves higher average accuracy than the state-of-the-art NVFP4 format across multiple models and diverse downstream tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Commonsense Reasoning | HellaSwag | Accuracy86.6 | 1891 | |
| Commonsense Reasoning | WinoGrande | Accuracy89.11 | 1085 | |
| Question Answering | ARC Challenge | Accuracy60.71 | 906 | |
| Multi-task Language Understanding | MMLU | Accuracy84.77 | 876 | |
| Question Answering | ARC Easy | Accuracy83.04 | 597 | |
| Physical Commonsense Reasoning | PIQA | Accuracy92.44 | 572 | |
| Question Answering | ARC-E | Accuracy87.95 | 416 | |
| Mathematical Reasoning | GSM8K | Accuracy (GSM8K)95.75 | 358 | |
| Boolean Question Answering | BoolQ | Accuracy86.27 | 323 | |
| Reading Comprehension | BoolQ | Accuracy78.87 | 279 |