HOT: Hadamard-based Optimized Training

About

It has become increasingly important to optimize backpropagation to reduce memory usage and computational overhead. Achieving this goal is highly challenging, as multiple objectives must be considered jointly while maintaining training quality. In this paper, we focus on matrix multiplication, which accounts for the largest portion of training costs, and analyze its backpropagation in detail to identify lightweight techniques that offer the best benefits. Based on this analysis, we introduce a novel method, Hadamard-based Optimized Training (HOT). In this approach, we apply Hadamard-based optimizations, such as Hadamard quantization and Hadamard low-rank approximation, selectively and with awareness of the suitability of each optimization for different backward paths. Additionally, we introduce two enhancements: activation buffer compression and layer-wise quantizer selection. Our extensive analysis shows that HOT achieves up to 75% memory savings and a 2.6 times acceleration on real GPUs, with negligible accuracy loss compared to FP32 precision.

Seonggon Kim, Juncheol Shin, Seung-taek Woo, Eunhyeok Park• 2025

Related benchmarks

Task	Dataset	Result
Semantic segmentation	Cityscapes	mIoU71.72	668
Image Classification	CIFAR10 (test)	Accuracy95.01	585
Image Classification	ImageNet-100 (val)	Top-1 Accuracy86.7	223
Image Classification	CIFAR100 (test)	Accuracy76.95	206
Classification	CIFAR100	Accuracy92.99	83
Language Modeling	Alpaca	Perplexity3.29	61
Image Classification	ImageNet 1k (train)	Top-1 Accuracy69.4	58
Semantic segmentation	VOC 2012	mIoU79.1	55
Classification	CIFAR10	Top-1 Accuracy98.6	38
Object Detection	VOC 2007	mAP85.1	26

Showing 10 of 12 rows

Other info

Code

Follow for update

@wizwand_team Discord