Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks

About

This work examines the challenges of training neural networks using vector quantization using straight-through estimation. We find that a primary cause of training instability is the discrepancy between the model embedding and the code-vector distribution. We identify the factors that contribute to this issue, including the codebook gradient sparsity and the asymmetric nature of the commitment loss, which leads to misaligned code-vector assignments. We propose to address this issue via affine re-parameterization of the code vectors. Additionally, we introduce an alternating optimization to reduce the gradient error introduced by the straight-through estimation. Moreover, we propose an improvement to the commitment loss to ensure better alignment between the codebook representation and the model embedding. These optimization methods improve the mathematical approximation of the straight-through estimation and, ultimately, the model performance. We demonstrate the effectiveness of our methods on several common model architectures, such as AlexNet, ResNet, and ViT, across various tasks, including image classification and generative modeling.

Minyoung Huh, Brian Cheung, Pulkit Agrawal, Phillip Isola• 2023

Related benchmarks

Task	Dataset	Result
Codebook utilization	Citeseer	Perplexity9.03	8
Codebook utilization	Cora	Perplexity75.32	8
Codebook utilization	wikiCS	Perplexity83.55	8
Codebook utilization	Ratings	Perplexity73.82	8
Codebook utilization	questions	Perplexity66.57	8
Codebook utilization	Pubmed	Perplexity126.5	8
Codebook utilization	Photo	Perplexity54.95	8
Codebook utilization	Computer	Perplexity59.33	8
Codebook utilization	Roman	Perplexity118.5	8
Graph Representation Learning	ogbn-arxiv (test)	Perplexity52.39	7

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord