Var-JEPA: A Variational Formulation of the Joint-Embedding Predictive Architecture -- Bridging Predictive and Generative Self-Supervised Learning

About

The Joint-Embedding Predictive Architecture (JEPA) is often seen as a non-generative alternative to likelihood-based self-supervised learning, emphasizing prediction in representation space rather than reconstruction in observation space. We argue that the resulting separation from probabilistic generative modeling is largely rhetorical rather than structural: the canonical JEPA design, coupled encoders with a context-to-target predictor, mirrors the variational posteriors and learned conditional priors obtained when variational inference is applied to a particular class of coupled latent-variable models, and standard JEPA can be viewed as a deterministic specialization in which regularization is imposed via architectural and training heuristics rather than an explicit likelihood. Building on this view, we derive the Variational JEPA (Var-JEPA), which makes the latent generative structure explicit by optimizing a single Evidence Lower Bound (ELBO). This yields meaningful representations without ad-hoc anti-collapse regularizers and allows principled uncertainty quantification in the latent space. We instantiate the framework for tabular data (Var-T-JEPA) and achieve strong representation learning and downstream performance, consistently improving over T-JEPA while remaining competitive with strong raw-feature baselines.

Moritz G\"ogl, Christopher Yau• 2026

Related benchmarks

Task	Dataset	Result
Classification	MNIST (test)	Macro F193.6	68
Binary Classification	Credit Card (test)	Macro F1 Score79.8	40
Classification	adult (AD) (test)	Macro F1 Score92.3	36
Classification	Covertype (CO) (test)	Macro F1-score81.6	36
Classification	Bank Marketing (BM) (test)	Macro F1 Score91.2	36
Classification	Electricity (EL) (test)	Macro F1-score89.1	36
Classification	SIM (test)	Macro F1-score95.5	36

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord