Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multilingual Phonological Feature Recognition with Self-Supervised Speech Models

About

Phonological features provide a language-general and linguistically grounded representation of speech. We present PhonoQ-2.0, a multilingual frame-level phonological feature recognizer built on self-supervised speech models. The system directly predicts a structured 22-dimensional feature vector per frame encoding manner, vowel quality, place, and voicing, instead of deriving features from phoneme outputs. To ensure phonologically coherent predictions, we introduce a manner-conditioned gating mechanism that activates valid feature groups. Evaluated across multiple languages and corpora, PhonoQ-2.0 achieves an average macro-F1 of 91.3% in-domain and 88.9% out-of-domain. Compared to a strong CTC phoneme baseline, it delivers consistent gains of +8.8 F1 in-domain and +8.6 out-of-domain on average. In unseen-language evaluation, PhonoQ-2.0 improves macro-F1 from 66.9% to 73.6% (+6.7 on average), with gains of up to +10.8 points.

Abner Hernandez, Tom\'as Arias-Vergara, Daiqi Liu, Andreas Maier, Paula Andrea P\'erez-Toro• 2026

Related benchmarks

TaskDatasetResultRank
Phonological Feature PredictionFLEURS German
Manner Group F186
2
Phonological Feature PredictionVoxPopuli German
Manner Group F180.6
2
Phonological Feature PredictionFLEURS Spanish
Manner Group F193.2
2
Phonological Feature PredictionVoxPopuli Spanish
Manner Group F190.3
2
Phonological Feature PredictionCommonVoice Czech (test)
Manner F190.9
2
Phonological Feature PredictionFLEURS Czech
Manner F189.2
2
Phonological Feature PredictionVoxPopuli Czech
Manner F1 Score91.8
2
Phonological Feature PredictionCommonVoice German (test)
Manner F1 Score88.9
2
Phonological Feature PredictionFLEURS German
Manner F186
2
Phonological Feature PredictionCommonVoice English (test)
Manner F190.5
2
Showing 10 of 17 rows

Other info

Follow for update