HuPER: A Human-Inspired Framework for Phonetic Perception

About

We propose HuPER, a human-inspired framework that models phonetic perception as adaptive inference over acoustic-phonetics evidence and linguistic knowledge. With only 100 hours of training data, HuPER achieves state-of-the-art phonetic error rates on five English benchmarks and strong zero-shot transfer to 95 unseen languages. HuPER is also the first framework to enable adaptive, multi-path phonetic perception under diverse acoustic conditions. All training data, models, and code are open-sourced. Code and demo avaliable at https://github.com/HuPER29/HuPER.

Chenxu Guo, Jiachen Lian, Yisi Liu, Baihe Huang, Shriyaa Narayanan, Cheol Jun Cho, Gopala Anumanchipalli• 2026

Related benchmarks

Task	Dataset	Result
Phone Feature Recognition	Buckeye (sociophonetic)	PFER7.36	25
Phone recognition	PRiSM Accented English Datasets	PFER (Timing)8.3	12
Phone recognition	PRiSM Multilingual Datasets	PFER (DRC)32	12
Phonetic Perception	DRC-SE (DoReCo South-England)	PFER0.0908	8
Phonetic Perception	L2-ARCTIC	PFER8	8
Phonetic Perception	SO762 (SpeechOcean762)	PFER9	8
Phonetic Perception	EpaDB	PFER0.1066	8

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord