BERT-JEPA: Reorganizing CLS Embeddings for Language-Invariant Semantics

About

Joint Embedding Predictive Architectures (JEPA) are a novel self supervised training technique that have shown recent promise across domains. We introduce BERT-JEPA (BEPA), a training paradigm that adds a JEPA training objective to BERT-style models, working to combat a collapsed [CLS] embedding space and turning it into a language-agnostic space. This new structure leads to increased performance across multilingual benchmarks.

Taj Gillin, Adam Lalani, Kenneth Zhang, Marcel Mateos Salles• 2026

Related benchmarks

Task	Dataset	Result
Natural Language Understanding	GLUE (val)	SST-292.87	201
Cross-lingual Question Answering	MLQA v1.0 (test)	F1 (es)67.8	34
Cross-lingual Sentence Classification	XNLI Language Transfer (test)	ar71.7	3

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord