Naver at ActivityNet Challenge 2019 -- Task B Active Speaker Detection (AVA)
About
This report describes our submission to the ActivityNet Challenge at CVPR 2019. We use a 3D convolutional neural network (CNN) based front-end and an ensemble of temporal convolution and LSTM classifiers to predict whether a visible person is speaking or not. Our results show significant improvements over the baseline on the AVA-ActiveSpeaker dataset.
Joon Son Chung• 2019
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Active Speaker Detection | AVA-ActiveSpeaker (val) | mAP87.8 | 107 | |
| Active Speaker Detection | AVA-ActiveSpeaker v1.0 (val) | mAP87.8 | 27 | |
| Active Speaker Detection | AVA-ActiveSpeaker (test) | mAP87.8 | 22 | |
| Active Speaker Detection | AVA-ActiveSpeaker v1.0 (test) | mAP87.8 | 13 | |
| Active Speaker Detection | AVA-ActiveSpeaker ActivityNet Challenge 2019 (test) | mAP87.8 | 9 |
Showing 5 of 5 rows