ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signals
About
Pre-trained foundation models have demonstrated remarkable success in audio, vision and language, yet their potential for general machine signal modeling with arbitrary sampling rates-covering acoustic, vibration, and other industrial sensor data-remains under-explored. In this work, we propose a novel foundation model ECHO that integrates an advanced band-split architecture with frequency positional embeddings, enabling spectral localization across arbitrary sampling configurations. Moreover, the model incorporates sliding patches to support inputs of variable length without padding or cropping, producing a concise embedding that retains both temporal and spectral fidelity and naturally extends to streaming scenarios. We evaluate our method on various kinds of machine signal datasets, including previous DCASE task 2 challenges (2020-2025), and widely-used industrial signal corpora. Experimental results demonstrate consistent state-of-the-art performance in machine signal anomaly detection and fault classification, confirming the effectiveness and generalization capability of the proposed model. We open-sourced ECHO on https://github.com/yucongzh/ECHO.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Anomalous Sound Detection | DCASE 2023 | Dataset-wise Harmonic Mean63.7 | 16 | |
| Anomalous Sound Detection | DCASE 2024 | Dataset-wise Harmonic Mean57.9 | 16 | |
| Anomalous Sound Detection | DCASE 2020 | Dataset-wise Harmonic Mean72.2 | 16 | |
| Fault Classification | SIREN | IIEE Accuracy (44.1k)100 | 15 | |
| Anomaly Detection | SIREN DCASE Tasks 2020-2025 | Performance 2020 (16k)72.23 | 15 | |
| Anomalous Sound Detection | DCASE 2022 | Dataset-wise Harmonic Mean60 | 12 | |
| Anomalous Sound Detection | DCASE 2025 | Dataset-wise Harmonic Mean58.7 | 7 |