Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

From Neurons to Semantics: Evaluating Cross-Linguistic Alignment Capabilities of Large Language Models via Neurons Alignment

About

Large language models (LLMs) have demonstrated remarkable multilingual capabilities, however, how to evaluate cross-lingual alignment remains underexplored. Existing alignment benchmarks primarily focus on sentence embeddings, but prior research has shown that neural models tend to induce a non-smooth representation space, which impact of semantic alignment evaluation on low-resource languages. Inspired by neuroscientific findings that similar information activates overlapping neuronal regions, we propose a novel Neuron State-Based Cross-Lingual Alignment (NeuronXA) to assess the cross-lingual a lignment capabilities of LLMs, which offers a more semantically grounded approach to assess cross-lingual alignment. We evaluate NeuronXA on several prominent multilingual LLMs (LLaMA, Qwen, Mistral, GLM, and OLMo) across two transfer tasks and three multilingual benchmarks. The results demonstrate that with only 100 parallel sentence pairs, NeuronXA achieves a Pearson correlation of 0.9556 with downstream tasks performance and 0.8514 with transferability. These findings demonstrate NeuronXA's effectiveness in assessing both cross-lingual alignment and transferability, even with a small dataset. This highlights its potential to advance cross-lingual alignment research and to improve the semantic understanding of multilingual LLMs.

Chongxuan Huang, Yongshi Ye, Biao Fu, Qifeng Su, Xiaodong Shi• 2025

Related benchmarks

TaskDatasetResultRank
Cross-lingual Alignment Correlationm-ARC FLORES (test)
Pearson Correlation0.9867
81
Cross-lingual Alignment Correlationm-MMLU FLORES (test)
Pearson Correlation0.9859
81
Cross-lingual Alignment CorrelationBelebele FLORES (test)
Pearson Correlation0.9796
81
Zero-Shot Cross-Lingual TransferXNLI
Pearson Correlation0.9639
48
Cross-Lingual Knowledge AlignmentBMLAMA
Pearson Correlation0.9062
48
Pearson correlation analysism-ARC
Pearson Correlation0.9847
13
Downstream task performance correlationMARC, MMLU, and Belebele (test)
Avg Pearson Correlation0.9621
8
Zero-Shot Cross-Lingual TransferXNLI (test)
Pearson Correlation0.9377
8
Cross-lingual transferabilityFLORES
Avg Pearson Correlation0.8597
6
Multilingual performanceFLORES
Avg Pearson Correlation0.9541
6
Showing 10 of 14 rows

Other info

Follow for update