Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EnvSSLAM-FFN: Lightweight Layer-Fused System for ESDD 2026 Challenge

About

Recent advances in generative audio models have enabled high-fidelity environmental sound synthesis, raising serious concerns for audio security. The ESDD 2026 Challenge therefore addresses environmental sound deepfake detection under unseen generators (Track 1) and black-box low-resource detection (Track 2) conditions. We propose EnvSSLAM-FFN, which integrates a frozen SSLAM self-supervised encoder with a lightweight FFN back-end. To effectively capture spoofing artifacts under severe data imbalance, we fuse intermediate SSLAM representations from layers 4-9 and adopt a class-weighted training objective. Experimental results show that the proposed system consistently outperforms the official baselines on both tracks, achieving Test Equal Error Rates (EERs) of 1.20% and 1.05%, respectively.

Xiaoxuan Guo, Hengyan Huang, Jiayi Zhou, Renhe Sun, Jian Liu, Haonan Cheng, Long Ye, Qin Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Environmental Deepfake DetectionEnvSDD Track 1 (Eval)
EER1.05
3
Environmental Deepfake DetectionEnvSDD Track 1 (test)
EER1.2
3
Environmental Deepfake DetectionEnvSDD Track 2 (eval)
EER1.24
3
Environmental Deepfake DetectionEnvSDD Track 2 (test)
EER1.05
3
Showing 4 of 4 rows

Other info

Follow for update