Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SSDM: Scalable Speech Dysfluency Modeling

About

Speech dysfluency modeling is the core module for spoken language learning, and speech therapy. However, there are three challenges. First, current state-of-the-art solutions\cite{lian2023unconstrained-udm, lian-anumanchipalli-2024-towards-hudm} suffer from poor scalability. Second, there is a lack of a large-scale dysfluency corpus. Third, there is not an effective learning framework. In this paper, we propose \textit{SSDM: Scalable Speech Dysfluency Modeling}, which (1) adopts articulatory gestures as scalable forced alignment; (2) introduces connectionist subsequence aligner (CSA) to achieve dysfluency alignment; (3) introduces a large-scale simulated dysfluency corpus called Libri-Dys; and (4) develops an end-to-end system by leveraging the power of large language models (LLMs). We expect SSDM to serve as a standard in the area of dysfluency modeling. Demo is available at \url{https://berkeley-speech-group.github.io/SSDM/}.

Jiachen Lian, Xuanru Zhou, Zoe Ezzes, Jet Vonk, Brittany Morin, David Baquirin, Zachary Mille, Maria Luisa Gorno Tempini, Gopala Krishna Anumanchipalli• 2024

Related benchmarks

TaskDatasetResultRank
Phonetic TranscriptionVCTK++ (test)
F1 Score93
25
Phonetic TranscriptionLibri-Dys (test)
F1 Score90.8
25
Dysfluency DetectionVCTK++
F1 Score90
7
Dysfluency DetectionLibri-Dys
F1 Score81.6
7
Dysfluency DetectionnfvPPA
F1 Score69.9
7
Showing 5 of 5 rows

Other info

Code

Follow for update