HuPER: A Human-Inspired Framework for Phonetic Perception
About
We propose HuPER, a human-inspired framework that models phonetic perception as adaptive inference over acoustic-phonetics evidence and linguistic knowledge. With only 100 hours of training data, HuPER achieves state-of-the-art phonetic error rates on five English benchmarks and strong zero-shot transfer to 95 unseen languages. HuPER is also the first framework to enable adaptive, multi-path phonetic perception under diverse acoustic conditions. All training data, models, and code are open-sourced. Code and demo avaliable at https://github.com/HuPER29/HuPER.
Chenxu Guo, Jiachen Lian, Yisi Liu, Baihe Huang, Shriyaa Narayanan, Cheol Jun Cho, Gopala Anumanchipalli• 2026
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Phone Feature Recognition | Buckeye (sociophonetic) | PFER7.36 | 25 | |
| Phone recognition | PRiSM Accented English Datasets | PFER (Timing)8.3 | 12 | |
| Phone recognition | PRiSM Multilingual Datasets | PFER (DRC)32 | 12 | |
| Phonetic Perception | DRC-SE (DoReCo South-England) | PFER0.0908 | 8 | |
| Phonetic Perception | L2-ARCTIC | PFER8 | 8 | |
| Phonetic Perception | SO762 (SpeechOcean762) | PFER9 | 8 | |
| Phonetic Perception | EpaDB | PFER0.1066 | 8 |
Showing 7 of 7 rows