POSTER++: A simpler and stronger facial expression recognition network
About
Facial expression recognition (FER) plays an important role in a variety of real-world applications such as human-computer interaction. POSTER achieves the state-of-the-art (SOTA) performance in FER by effectively combining facial landmark and image features through two-stream pyramid cross-fusion design. However, the architecture of POSTER is undoubtedly complex. It causes expensive computational costs. In order to relieve the computational pressure of POSTER, in this paper, we propose POSTER++. It improves POSTER in three directions: cross-fusion, two-stream, and multi-scale feature extraction. In cross-fusion, we use window-based cross-attention mechanism replacing vanilla cross-attention mechanism. We remove the image-to-landmark branch in the two-stream design. For multi-scale feature extraction, POSTER++ combines images with landmark's multi-scale features to replace POSTER's pyramid design. Extensive experiments on several standard datasets show that our POSTER++ achieves the SOTA FER performance with the minimum computational cost. For example, POSTER++ reached 92.21% on RAF-DB, 67.49% on AffectNet (7 cls) and 63.77% on AffectNet (8 cls), respectively, using only 8.4G floating point operations (FLOPs) and 43.7M parameters (Param). This demonstrates the effectiveness of our improvements.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Facial Expression Recognition | RAF-DB (test) | Accuracy92.21 | 180 | |
| Facial Expression Recognition | FERPlus (test) | Accuracy0.9228 | 100 | |
| Facial Expression Recognition | AffectNet 7-way (test) | Accuracy67.49 | 91 | |
| Facial Expression Recognition | AffectNet 8-way (test) | Accuracy63.77 | 65 | |
| Facial Expression Recognition | RAF-DB | Accuracy92.21 | 45 | |
| Facial Expression Recognition | JAFFE | Accuracy96.67 | 36 | |
| Facial Expression Recognition | AffWild2 (test) | Accuracy69.18 | 33 | |
| Facial Expression Recognition | FERPlus | Accuracy92.28 | 29 | |
| Facial Expression Recognition | AffectNet (test) | Accuracy63.76 | 28 | |
| Facial Expression Recognition | FERG (test) | Accuracy96.36 | 18 |