Optimizing Large Language Model Training Using FP4 Quantization
About
The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operations to reduce these costs. While FP8 precision has demonstrated feasibility, leveraging FP4 remains a challenge due to significant quantization errors and limited representational capacity. This work introduces the first FP4 training framework for LLMs, addressing these challenges with two key innovations: a differentiable quantization estimator for precise weight updates and an outlier clamping and compensation strategy to prevent activation collapse. To ensure stability, the framework integrates a mixed-precision training scheme and vector-wise quantization. Experimental results demonstrate that our FP4 framework achieves accuracy comparable to BF16 and FP8, with minimal degradation, scaling effectively to 13B-parameter LLMs trained on up to 100B tokens. With the emergence of next-generation hardware supporting FP4, our framework sets a foundation for efficient ultra-low precision training.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Commonsense Reasoning | PIQA | Accuracy73.78 | 757 | |
| Question Answering | ARC Challenge | Accuracy (ARC)39.85 | 598 | |
| Question Answering | OBQA | Accuracy39.6 | 347 | |
| Logical reasoning | LogiQA | LogiQA Accuracy30.88 | 251 | |
| Reading Comprehension | BoolQ | Accuracy (BoolQ)62.2 | 228 | |
| Question Answering | ARC Easy | Accuracy67.97 | 210 | |
| Word Prediction | LAMBADA | Accuracy46.89 | 192 | |
| Commonsense Reasoning | HellaSwag | -- | 43 | |
| Language Modeling | Zero-shot Perplexity Suite (Lambada, Pile 10k, Wikitext) | Average Perplexity33.99 | 6 | |
| Question Answering | SciQ | Accuracy (SciQ)85.8 | 6 |