A Clockwork RNN
About
Sequence prediction and classification are ubiquitous and challenging problems in machine learning that can require identifying complex dependencies between temporally distant inputs. Recurrent Neural Networks (RNNs) have the ability, in theory, to cope with these temporal dependencies by virtue of the short-term memory implemented by their recurrent (feedback) connections. However, in practice they are difficult to train successfully when the long-term memory is required. This paper introduces a simple, yet powerful modification to the standard RNN architecture, the Clockwork RNN (CW-RNN), in which the hidden layer is partitioned into separate modules, each processing inputs at its own temporal granularity, making computations only at its prescribed clock rate. Rather than making the standard RNN models more complex, CW-RNN reduces the number of RNN parameters, improves the performance significantly in the tasks tested, and speeds up the network evaluation. The network is demonstrated in preliminary experiments involving two tasks: audio signal generation and TIMIT spoken word classification, where it outperforms both RNN and LSTM networks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | PG-19 | -- | 206 | |
| Character-level Language Modeling | Penn Treebank (test) | BPC1.46 | 113 | |
| Character-level Language Modeling | text8 (held-out 1M tokens) | BPC2.92 | 14 | |
| Character-level Language Modeling | text8 100M regime (Current stream split) | Current BPC2.85 | 7 | |
| Character-level Language Modeling | text8 100M regime Backward stream | Backward BPC2.78 | 7 | |
| Character-level Language Modeling | text8 100M regime (Forward split) | Forward BPC2.88 | 7 | |
| Character-level Language Modeling | text8 (most recent 1M tokens) | BPC2.79 | 7 |