Technical Report of Nomi Team in the Environmental Sound Deepfake Detection Challenge 2026

About

This paper presents our work for the ICASSP 2026 Environmental Sound Deepfake Detection (ESDD) Challenge. The challenge is based on the large-scale EnvSDD dataset that consists of various synthetic environmental sounds. We focus on addressing the complexities of unseen generators and low-resource black-box scenarios by proposing an audio-text cross-attention model. Experiments with individual and combined text-audio models demonstrate competitive EER improvements over the challenge baseline (BEATs+AASIST model).

Candy Olivia Mawalim, Haotian Zhang, Shogo Okada• 2025

Related benchmarks

Task	Dataset	Result
Environmental Sound Deepfake Detection	ESDD Track 2 Black-Box Low-Resource 2026 (val)	EER0.07	4
Environmental Sound Deepfake Detection	ESDD Track 2 (Black-Box Low-Resource) 2026 (test)	EER11.98	4
Environmental Sound Deepfake Detection	EnvSDD Track 1 (Unseen Generators) 2026 (val)	EER0.07	4
Environmental Sound Deepfake Detection	EnvSDD Track 1 (Unseen Generators) 2026 (test)	EER (%)11.22	4

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord