Technical Report of Nomi Team in the Environmental Sound Deepfake Detection Challenge 2026
About
This paper presents our work for the ICASSP 2026 Environmental Sound Deepfake Detection (ESDD) Challenge. The challenge is based on the large-scale EnvSDD dataset that consists of various synthetic environmental sounds. We focus on addressing the complexities of unseen generators and low-resource black-box scenarios by proposing an audio-text cross-attention model. Experiments with individual and combined text-audio models demonstrate competitive EER improvements over the challenge baseline (BEATs+AASIST model).
Candy Olivia Mawalim, Haotian Zhang, Shogo Okada• 2025
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Environmental Sound Deepfake Detection | ESDD Track 2 Black-Box Low-Resource 2026 (val) | EER0.07 | 4 | |
| Environmental Sound Deepfake Detection | ESDD Track 2 (Black-Box Low-Resource) 2026 (test) | EER11.98 | 4 | |
| Environmental Sound Deepfake Detection | EnvSDD Track 1 (Unseen Generators) 2026 (val) | EER0.07 | 4 | |
| Environmental Sound Deepfake Detection | EnvSDD Track 1 (Unseen Generators) 2026 (test) | EER (%)11.22 | 4 |
Showing 4 of 4 rows