Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Cross-lingual Similarity of Multilingual Representations Revisited

About

Related works used indexes like CKA and variants of CCA to measure the similarity of cross-lingual representations in multilingual language models. In this paper, we argue that assumptions of CKA/CCA align poorly with one of the motivating goals of cross-lingual learning analysis, i.e., explaining zero-shot cross-lingual transfer. We highlight what valuable aspects of cross-lingual similarity these indexes fail to capture and provide a motivating case study \textit{demonstrating the problem empirically}. Then, we introduce \textit{Average Neuron-Wise Correlation (ANC)} as a straightforward alternative that is exempt from the difficulties of CKA/CCA and is good specifically in a cross-lingual context. Finally, we use ANC to construct evidence that the previously introduced ``first align, then predict'' pattern takes place not only in masked language models (MLMs) but also in multilingual models with \textit{causal language modeling} objectives (CLMs). Moreover, we show that the pattern extends to the \textit{scaled versions} of the MLMs and CLMs (up to 85x original mBERT).\footnote{Our code is publicly available at \url{https://github.com/TartuNLP/xsim}}

Maksym Del, Mark Fishel• 2022

Related benchmarks

TaskDatasetResultRank
Cross-Lingual Knowledge AlignmentBMLAMA
Pearson Correlation0.9156
48
Zero-Shot Cross-Lingual TransferXNLI
Pearson Correlation0.9082
48
Pearson correlation analysism-ARC
Pearson Correlation0.9683
13
Cross-lingual transferabilityFLORES
Avg Pearson Correlation0.8537
6
Multilingual performanceFLORES
Avg Pearson Correlation0.9313
6
Pearson correlation analysism-MMLU
Pearson Correlation (r)0.976
6
Showing 6 of 6 rows

Other info

Follow for update