Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings

About

Transformers generalize to novel compositions of structures and entities after being trained on a complex dataset, but easily overfit on datasets of insufficient complexity. We observe that when the training set is sufficiently complex, the model encodes sentences that have a common syntactic structure using a systematic attention pattern. Inspired by this observation, we propose SQ-Transformer (Structurally Quantized) that explicitly encourages systematicity in the embeddings and attention layers, even with a training set of low complexity. At the embedding level, we introduce Structure-oriented Vector Quantization (SoVQ) to cluster word embeddings into several classes of structurally equivalent entities. At the attention level, we devise the Systematic Attention Layer (SAL) and an alternative, Systematically Regularized Layer (SRL) that operate on the quantized word embeddings so that sentences of the same structure are encoded with invariant or similar attention patterns. Empirically, we show that SQ-Transformer achieves stronger compositional generalization than the vanilla Transformer on multiple low-complexity semantic parsing and machine translation datasets. In our analysis, we show that SoVQ indeed learns a syntactically clustered embedding space and SAL/SRL induces generalizable attention patterns, which lead to improved systematicity.

Yichen Jiang, Xiang Zhou, Mohit Bansal• 2024

Related benchmarks

Task	Dataset	Result
Machine Translation	WMT En-Fr 2014 (test)	BLEU38.38	237
Machine Translation	WMT EN-DE 2017 (test)	BLEU Score0.2921	50
Semantic Parsing	SCAN around right	Exact-match Accuracy99.63	16
Semantic Parsing	COGS (test)	Exact Match Accuracy83.36	16
Machine Translation	CoGnition compositional generalization (test)	Inst. Error Rate18.14	15
Semantic Parsing	SCAN 2x augmented (ADDJUMP)	Accuracy99.42	5
Machine Translation	WMT De-En 2017 (test)	BLEU31.96	4
Machine Translation	WMT fr-en 2014 (test)	BLEU Score35.56	2

Showing 8 of 8 rows

Other info

Code

Follow for update

@wizwand_team Discord