Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Trellis Networks for Sequence Modeling

About

We present trellis networks, a new architecture for sequence modeling. On the one hand, a trellis network is a temporal convolutional network with special structure, characterized by weight tying across depth and direct injection of the input into deep layers. On the other hand, we show that truncated recurrent networks are equivalent to trellis networks with special sparsity structure in their weight matrices. Thus trellis networks with general weight matrices generalize truncated recurrent networks. We leverage these connections to design high-performing trellis networks that absorb structural and algorithmic elements from both recurrent and convolutional models. Experiments demonstrate that trellis networks outperform the current state of the art methods on a variety of challenging benchmarks, including word-level language modeling and character-level language modeling tasks, and stress tests designed to evaluate long-term memory retention. The code is available at https://github.com/locuslab/trellisnet .

Shaojie Bai, J. Zico Kolter, Vladlen Koltun• 2018

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-103 (test)
Perplexity29.19
524
Language ModelingPenn Treebank (test)
Perplexity54.19
411
Pixel-by-pixel Image ClassificationPermuted Sequential MNIST (pMNIST) (test)
Accuracy98.13
79
Sequential Image ClassificationPMNIST (test)
Accuracy (Test)98.13
77
Language ModelingPenn Treebank word-level (test)
Perplexity57
72
Sequential Image ClassificationS-MNIST (test)
Accuracy99.2
70
Word-level Language ModelingWikiText-103 word-level (test)
Perplexity29.19
65
Pixel-level 1-D image classificationSequential MNIST (test)
Accuracy99.2
53
Permuted Sequential Image ClassificationMNIST Permuted Sequential
Test Accuracy Mean98.13
50
Sequential Image ClassificationSequential CIFAR10
Accuracy73.42
48
Showing 10 of 24 rows

Other info

Code

Follow for update