Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning Energy-Based Models from Stochastic Interpolants using Spatiotemporal Differences

About

Learning an energy-based model from data samples is a central problem in machine learning. Many recent and popular methods, such as denoising score matching for training energy-based diffusion models, use stochastic interpolants to corrupt data samples at different noise levels indexed by a time variable. This defines a joint density over both the data space and time, and most methods learn its energy through either spatial or temporal differences. We identify distinct failure modes for both of these approaches. To solve them, we propose Spatiotemporal Noise-Contrastive Estimation (stNCE), a framework for learning the energy through joint spatiotemporal differences. stNCE unifies many existing methods and leads to new training objectives. Experiments on images and molecules demonstrate performance competitive with state-of-the-art density estimation methods.

Hanlin Yu, RuiKang OuYang, Partha Kaushik, Arto Klami, Michael U. Gutmann, Omar Chehab• 2026

Related benchmarks

TaskDatasetResultRank
Density EstimationImageNet 64x64 (test)
Bits Per Sub-Pixel2.94
71
Density EstimationMNIST (test)
NLL (bits/dim)1
69
Density EstimationMNIST mixture 784 dim (test)
MSE2
11
Density Estimationrandom GMM 20 0.1 (dim 10) reuse sampling scheme 1.0 (test)
MSE2.02
6
Energy-based modelingAlanine dipeptide (test)
IID JS Divergence0.0072
5
Energy-based Density EstimationChignolin
IID JS Divergence0.0074
5
Showing 6 of 6 rows

Other info

Follow for update