Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Better Embeddings with Coupled Adam

About

Despite their remarkable capabilities, LLMs learn word representations that exhibit the undesirable yet poorly understood feature of anisotropy. In this paper, we argue that the second moment in Adam is a cause of anisotropic embeddings, and suggest a modified optimizer called Coupled Adam to mitigate the problem. Our experiments demonstrate that Coupled Adam significantly improves the quality of embeddings, while also leading to better upstream and downstream performance on large enough datasets.

Felix Stollenwerk, Tobias Stollenwerk• 2025

Related benchmarks

TaskDatasetResultRank
Downstream Task11 Downstream Tasks Aggregate
Average Accuracy39.2
32
Embedding Space AnalysisOpenWebText
Iso0.98
18
Language ModelingOpenWebText (test)
Loss2.65
18
Language ModelingSlimPajama large-scale (train)
L(ψ)2.129
8
Showing 4 of 4 rows

Other info

Code

Follow for update