Tail Annealing for Heavy-Tailed Flow Matching

About

Standard generative models struggle with heavy-tailed data: Lipschitz architectures cannot produce power-law tails from Gaussian noise, and interpolating between heavy-tailed data and Gaussians is ill-posed. We propose a simple fix: apply the soft-log transform $\phi(x) = \mathrm{sign}(x) \cdot \log(1 + |x|)$ coordinate-wise to data before training, then exponentiate samples after generation. A Hill diagnostic decides per-coordinate whether to transform, leaving light-tailed margins untouched at no added complexity. This compresses heavy tails into a range where standard flow matching succeeds, without heavy-tailed base distributions or architectural modifications. We provide theoretical intuition for why this works: the log-transform maps Pareto tails to exponentials, and the induced dynamics implement a form of tail annealing via power transformations. On a 144-configuration multivariate benchmark (3 copulas, $d$ up to 100, 4 tail indices), Log-FM dominates specialized baselines on $W_1$, CVaR$_{99}$, and extreme-quantile metrics, and is the only method with zero severe divergences across 2{,}880 runs.

Jean Pachebat• 2026

Related benchmarks

Task	Dataset	Result
Heavy-tailed Flow Matching	Gumbel + Gaussian copulas (test)	WP10.074	80
Distribution Estimation	Hickling Student-t benchmark original (test)	Wasserstein-1 distance0.14	30
Flow Matching	Gumbel + Gaussian alpha=1.5	Catastrophic Failure Fraction (WP1 > 1)2	20
Flow Matching	Gumbel + Gaussian (alpha=2.0)	Catastrophic Failure Rate0.00e+0	20
Generative Modeling	Gumbel + Gaussian Median across all configurations 480 values per cell	W1^P (Pareto Margins)0.187	20
Generative Modeling	Fama-French 5	W1 Distance0.133	5

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord