Structured Multidimensional Representation Learning for Large Language Models

About

Transformer architectures achieve state-of-the-art performance across a wide range of pattern recognition and natural language processing tasks, but their scaling is accompanied by substantial parameter growth and redundancy in the embedding dimension. In this work, we introduce a structured spectral factorization of the embedding space based on the L-product for third-order tensors. By reshaping token representations into spectral tensor slices and performing attention and feed-forward operations in the transform domain, we obtain a Tensor Transformer architecture that decomposes the encoder into p independent spectral sub-transformers while preserving standard Transformer semantics. We prove that the proposed L-Transformer is spectrally equivalent to p parallel Transformers operating on reduceddimensional embeddings, which yields approximately 1/p reduction (up to lower-order terms such as biases and normalization parameters) in encoder parameters under fixed total embedding size. When instantiated with a real-valued Discrete Cosine Transform (DCT), the method remains fully differentiable and compatible with existing training pipelines. Beyond compression, the spectral decomposition introduces an inductive bias over embedding frequencies, enabling slice-dependent frequency scaling that improves generalization. Experiments on IMDB and AG~News show that the proposed model can substantially reduce encoder parameters (up to 75\% for p=4) while maintaining competitive accuracy. On IMDB, the tensorized encoder matches or improves upon the standard baseline under compression, whereas on AG~News at moderate width we observe a small accuracy decrease in exchange for a 4 times encoder reduction; at BERT-base width (d=768), performance returns to parity.

Alaa El Ichi, Khalide Jbilou, Mohamed El Guide, Franck Dufrenois• 2026

Related benchmarks

Task	Dataset	Result
Text Classification	AG News (test)	Accuracy91.52	293
Sentiment Analysis	IMDB d=128	Accuracy (%)82.02	7
Topic Classification	AG News d=256 (test)	Accuracy90.76	3
Text Classification	IMDB (test)	Accuracy82.02	2
Topic Classification	AG News BERT-base width (test)	Accuracy91.52	2

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord