pyannote.audio: neural building blocks for speaker diarization

About

We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. pyannote.audio also comes with pre-trained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding -- reaching state-of-the-art performance for most of them.

Herv\'e Bredin, Ruiqing Yin, Juan Manuel Coria, Gregory Gelly, Pavel Korshunov, Marvin Lavechin, Diego Fustes, Hadrien Titeux, Wassim Bouaziz, Marie-Philippe Gill• 2019

Related benchmarks

Task	Dataset	Result
Speaker-attributed Automatic Speech Recognition	AISHELL-4 (test)	cpCER0.2786	33
Speaker Diarization	AMI	DER15.41	28
Speaker Diarization	AISHELL-4	DER (%)6.27	24
Speaker Diarization	RAMC	DER12.97	13
Speaker Diarization	MSDWild	DER12.25	10
Speaker Diarization	VoxConverse	DER6.81	10
Speaker Diarization	AliMeeting	DER15.67	10
Speaker Diarization	AMI (test)	DER24.8	8
Speaker-attributed Automatic Speech Recognition	Movies (test)	CER9.94	6
Speaker Diarization	VoxConverse (test)	--	5

Showing 10 of 36 rows

Other info

Code

Follow for update

@wizwand_team Discord