AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation

About

Hadamard transforms have become a key tool for stabilizing low-precision training, but existing methods apply them uniformly across tensors and computation paths. We show that this one-size-fits-all strategy is inherently limited: Hadamard smoothing reduces quantization error only when its direction is properly aligned with the operand's outlier structure. Through a systematic study of weights, activations, and gradients in LLM training, we identify three stable outlier patterns, Row-wise, Column-wise, and None, and show that each outlier pattern pair in matrix multiplication requires a distinct transform or outlier-handling strategy. We propose AdaHOP, Adaptive Hadamard transform with Outlier-Pattern-aware strategy, which applies Inner Hadamard Transform (IHT) when inner-dimension mixing properly suppresses the operands' outliers, and selectively applies Outlier Extraction (OE) that extracts dominant outlier rows or columns into a high-precision path when it does not. With fused, hardware-aware Triton kernels, AdaHOP enables training from scratch at MXFP4 precision with BF16-level quality, while achieving up to 3.6X memory compression, 1.46X end-to-end training speedup over BF16.

Seonggon Kim, Alireza Khodamoradi, Pranathi Vasireddy, Kristof Denolf, Eunhyeok Park• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	HellaSwag Accuracy64.98	897
Physical Commonsense Reasoning	PIQA	Accuracy76.28	724
Question Answering	ARC-E	Accuracy55.89	544
Language Modeling	LAMBADA	Accuracy48.57	114
Model Training	Llama3.1-8B (train)	Memory (GB)20.94	7

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord