Cross-lingual Similarity of Multilingual Representations Revisited

About

Related works used indexes like CKA and variants of CCA to measure the similarity of cross-lingual representations in multilingual language models. In this paper, we argue that assumptions of CKA/CCA align poorly with one of the motivating goals of cross-lingual learning analysis, i.e., explaining zero-shot cross-lingual transfer. We highlight what valuable aspects of cross-lingual similarity these indexes fail to capture and provide a motivating case study \textit{demonstrating the problem empirically}. Then, we introduce \textit{Average Neuron-Wise Correlation (ANC)} as a straightforward alternative that is exempt from the difficulties of CKA/CCA and is good specifically in a cross-lingual context. Finally, we use ANC to construct evidence that the previously introduced ``first align, then predict'' pattern takes place not only in masked language models (MLMs) but also in multilingual models with \textit{causal language modeling} objectives (CLMs). Moreover, we show that the pattern extends to the \textit{scaled versions} of the MLMs and CLMs (up to 85x original mBERT).\footnote{Our code is publicly available at \url{https://github.com/TartuNLP/xsim}}

Maksym Del, Mark Fishel• 2022

Related benchmarks

Task	Dataset	Result
Cross-Lingual Knowledge Alignment	BMLAMA	Pearson Correlation0.9156	48
Zero-Shot Cross-Lingual Transfer	XNLI	Pearson Correlation0.9082	48
Pearson correlation analysis	m-ARC	Pearson Correlation0.9683	13
Cross-lingual transferability	FLORES	Avg Pearson Correlation0.8537	6
Multilingual performance	FLORES	Avg Pearson Correlation0.9313	6
Pearson correlation analysis	m-MMLU	Pearson Correlation (r)0.976	6

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord