Structured Inference Networks for Nonlinear State Space Models
About
Gaussian state space models have been used for decades as generative models of sequential data. They admit an intuitive probabilistic interpretation, have a simple functional form, and enjoy widespread adoption. We introduce a unified algorithm to efficiently learn a broad class of linear and non-linear state space models, including variants where the emission and transition distributions are modeled by deep neural networks. Our learning algorithm simultaneously learns a compiled inference network and the generative model, leveraging a structured variational approximation parameterized by recurrent neural networks to mimic the posterior distribution. We apply the learning algorithm to both synthetic and real-world datasets, demonstrating its scalability and versatility. We find that using the structured approximation to the posterior results in models with significantly higher held-out likelihood.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Forecasting | MIMIC-III (test) | MSE0.92 | 43 | |
| Irregularly Sampled Time Series Forecasting | USHCN (test) | MSE0.83 | 26 | |
| Polyphonic music modeling | JSB Chorales | Negative Log-Likelihood (nats)6.39 | 14 | |
| Polyphonic music modeling | Nottingham (Nott) | NLL (nats)2.77 | 14 | |
| Polyphonic music modeling | MuseData (Muse) | Negative Log-Likelihood (nats)6.83 | 12 | |
| Polyphonic music modeling | Piano-midi.de | NLL (nats)7.83 | 12 | |
| Polyphonic Music Generation | Nottingham (test) | NLL2.679 | 11 | |
| Interpolation | WSJ0 Audio Spectrogram | Interpolation FID (0.0-0.8)10.8 | 10 | |
| Generative Modeling | WSJ0 Audio Spectrogram | Log P(x)1.55 | 10 | |
| Generative Modeling | Human Motion Capture h3.6m | Log Likelihood2.31 | 10 |