Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation

About

Hadamard transforms have become a key tool for stabilizing low-precision training, but existing methods apply them uniformly across tensors and computation paths. We show that this one-size-fits-all strategy is inherently limited: Hadamard smoothing reduces quantization error only when its direction is properly aligned with the operand's outlier structure. Through a systematic study of weights, activations, and gradients in LLM training, we identify three stable outlier patterns, Row-wise, Column-wise, and None, and show that each outlier pattern pair in matrix multiplication requires a distinct transform or outlier-handling strategy. We propose AdaHOP, Adaptive Hadamard transform with Outlier-Pattern-aware strategy, which applies Inner Hadamard Transform (IHT) when inner-dimension mixing properly suppresses the operands' outliers, and selectively applies Outlier Extraction (OE) that extracts dominant outlier rows or columns into a high-precision path when it does not. With fused, hardware-aware Triton kernels, AdaHOP enables training from scratch at MXFP4 precision with BF16-level quality, while achieving up to 3.6X memory compression, 1.46X end-to-end training speedup over BF16.

Seonggon Kim, Alireza Khodamoradi, Pranathi Vasireddy, Kristof Denolf, Eunhyeok Park• 2026

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
HellaSwag Accuracy64.98
711
Physical Commonsense ReasoningPIQA
Accuracy76.28
696
Question AnsweringARC-E
Accuracy55.89
523
Language ModelingLAMBADA
Accuracy48.57
103
Model TrainingLlama3.1-8B (train)
Memory (GB)20.94
7
Showing 5 of 5 rows

Other info

Follow for update