CREPE: A Convolutional Representation for Pitch Estimation

About

The task of estimating the fundamental frequency of a monophonic sound recording, also known as pitch tracking, is fundamental to audio processing with multiple applications in speech processing and music information retrieval. To date, the best performing techniques, such as the pYIN algorithm, are based on a combination of DSP pipelines and heuristics. While such techniques perform very well on average, there remain many cases in which they fail to correctly estimate the pitch. In this paper, we propose a data-driven pitch tracking algorithm, CREPE, which is based on a deep convolutional neural network that operates directly on the time-domain waveform. We show that the proposed model produces state-of-the-art results, performing equally or better than pYIN. Furthermore, we evaluate the model's generalizability in terms of noise robustness. A pre-trained version of CREPE is made freely available as an open-source Python module for easy application.

Jong Wook Kim, Justin Salamon, Peter Li, Juan Pablo Bello• 2018

Related benchmarks

Task	Dataset	Result
Environmental Sound Classification	FSD50K	mAP15.9	91
Voiced/Unvoiced Detection	Speech	V/UV Recall87.98	50
Audio Representation Evaluation	HEAR (Holistic Evaluation of Audio Representations)	HEAR Average23.8	47
Environmental Sound Classification	ESC	Top-1 Acc29.4	28
Environmental Sound Classification	Gunshot triangulation	Top-1 Acc91.7	23
Beijing Opera percussion classification	Beijing Opera	Top-1 Acc93.2	22
Music genre and Speech vs Music classification	GTZAN	Genre Accuracy64.5	22
Percussion stroke and tonic classification	Mridangam	Stroke Accuracy88.7	22
Sound Event Detection	DCASE HEAR challenge	Onset FMS55.2	20
Fundamental Frequency Estimation	Speech, Singing Voice, and Music Clean	RPA (5 cents)0.1307	12

Showing 10 of 26 rows

Other info

Follow for update

@wizwand_team Discord