Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

pyannote.audio: neural building blocks for speaker diarization

About

We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. pyannote.audio also comes with pre-trained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding -- reaching state-of-the-art performance for most of them.

Herv\'e Bredin, Ruiqing Yin, Juan Manuel Coria, Gregory Gelly, Pavel Korshunov, Marvin Lavechin, Diego Fustes, Hadrien Titeux, Wassim Bouaziz, Marie-Philippe Gill• 2019

Related benchmarks

TaskDatasetResultRank
Speaker DiarizationAISHELL-4
DER (%)6.27
20
Speaker DiarizationAMI
DER15.41
15
Speaker DiarizationRAMC
DER12.97
9
Speaker DiarizationMSDWild
DER12.25
6
Speaker DiarizationVoxConverse
DER6.81
6
Speaker-attributed Automatic Speech RecognitionMovies (test)
CER9.94
6
Speaker DiarizationAliMeeting
DER15.67
6
Speaker-attributed Automatic Speech RecognitionAISHELL-4 (test)
CER0.1818
4
Speaker-attributed Automatic Speech RecognitionPodcast (test)
CER7.93
4
Overlapped Speech DetectionAMI (test)
Precision91.9
3
Showing 10 of 33 rows

Other info

Code

Follow for update