Tail Annealing for Heavy-Tailed Flow Matching
About
Standard generative models struggle with heavy-tailed data: Lipschitz architectures cannot produce power-law tails from Gaussian noise, and interpolating between heavy-tailed data and Gaussians is ill-posed. We propose a simple fix: apply the soft-log transform $\phi(x) = \mathrm{sign}(x) \cdot \log(1 + |x|)$ coordinate-wise to data before training, then exponentiate samples after generation. A Hill diagnostic decides per-coordinate whether to transform, leaving light-tailed margins untouched at no added complexity. This compresses heavy tails into a range where standard flow matching succeeds, without heavy-tailed base distributions or architectural modifications. We provide theoretical intuition for why this works: the log-transform maps Pareto tails to exponentials, and the induced dynamics implement a form of tail annealing via power transformations. On a 144-configuration multivariate benchmark (3 copulas, $d$ up to 100, 4 tail indices), Log-FM dominates specialized baselines on $W_1$, CVaR$_{99}$, and extreme-quantile metrics, and is the only method with zero severe divergences across 2{,}880 runs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Heavy-tailed Flow Matching | Gumbel + Gaussian copulas (test) | WP10.074 | 80 | |
| Distribution Estimation | Hickling Student-t benchmark original (test) | Wasserstein-1 distance0.14 | 30 | |
| Flow Matching | Gumbel + Gaussian alpha=1.5 | Catastrophic Failure Fraction (WP1 > 1)2 | 20 | |
| Flow Matching | Gumbel + Gaussian (alpha=2.0) | Catastrophic Failure Rate0.00e+0 | 20 | |
| Generative Modeling | Gumbel + Gaussian Median across all configurations 480 values per cell | W1^P (Pareto Margins)0.187 | 20 | |
| Generative Modeling | Fama-French 5 | W1 Distance0.133 | 5 |