Sequential Neural Models with Stochastic Layers
About
How can we efficiently propagate uncertainty in a latent state representation with recurrent neural networks? This paper introduces stochastic recurrent neural networks which glue a deterministic recurrent neural network and a state space model together to form a stochastic and sequential neural generative model. The clear separation of deterministic and stochastic layers allows a structured variational inference network to track the factorization of the model's posterior distribution. By retaining both the nonlinear recursive structure of a recurrent neural network and averaging over the uncertainty in a latent path, like a state space model, we improve the state of the art results on the Blizzard and TIMIT speech modeling data sets by a large margin, while achieving comparable performances to competing methods on polyphonic music modeling.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Polyphonic music modeling | JSB Chorales | Negative Log-Likelihood (nats)4.74 | 14 | |
| Polyphonic music modeling | Nottingham (Nott) | NLL (nats)2.94 | 14 | |
| Polyphonic music modeling | MuseData (Muse) | Negative Log-Likelihood (nats)6.28 | 12 | |
| Polyphonic music modeling | Piano-midi.de | NLL (nats)8.2 | 12 | |
| Generative Modeling | Human Motion Capture h3.6m | Log Likelihood2.94 | 10 | |
| Generative Modeling | WSJ0 Audio Spectrogram | Log P(x)1.94 | 10 | |
| Interpolation | Human Motion Capture h3.6m | FID (0.0-0.8)43.5 | 10 | |
| Interpolation | WSJ0 Audio Spectrogram | Interpolation FID (0.0-0.8)19.4 | 10 |