Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Temporal Convolutional Networks for Action Segmentation and Detection

About

The ability to identify and temporally segment fine-grained human actions throughout a video is crucial for robotics, surveillance, education, and beyond. Typical approaches decouple this problem by first extracting local spatiotemporal features from video frames and then feeding them into a temporal classifier that captures high-level temporal patterns. We introduce a new class of temporal models, which we call Temporal Convolutional Networks (TCNs), that use a hierarchy of temporal convolutions to perform fine-grained action segmentation or detection. Our Encoder-Decoder TCN uses pooling and upsampling to efficiently capture long-range temporal patterns whereas our Dilated TCN uses dilated convolutions. We show that TCNs are capable of capturing action compositions, segment durations, and long-range dependencies, and are over a magnitude faster to train than competing LSTM-based Recurrent Neural Networks. We apply these models to three challenging fine-grained datasets and show large improvements over the state of the art.

Colin Lea, Michael D. Flynn, Rene Vidal, Austin Reiter, Gregory D. Hager• 2016

Related benchmarks

TaskDatasetResultRank
Action SegmentationBreakfast
Acc43.3
116
Action Segmentation50Salads
Edit Distance43.1
114
Temporal action segmentation50Salads
Accuracy80.7
112
Temporal action segmentationGTEA
F1 Score @ 10% Threshold85.8
105
Temporal action segmentationBreakfast
Accuracy43.3
102
Action SegmentationGTEA
Accuracy64
49
EV charging demand forecastingPalo Alto (test)
MSE1.40e+3
38
Financial Time Series ForecastingCSI300
Average Return (AR)0.278
26
Financial Time Series ForecastingCSI500
AR0.424
26
Financial Time Series ForecastingAvg. across markets
AR0.394
26
Showing 10 of 34 rows

Other info

Follow for update