Voting-based Pitch Estimation with Temporal and Frequential Alignment and Correlation Aware Selection

About

The voting method, an ensemble approach for fundamental frequency estimation, is empirically known for its robustness but lacks thorough investigation. This paper provides a principled analysis and improvement of this technique. First, we offer a theoretical basis for its effectiveness, explaining the error variance reduction for fundamental frequency estimation and invoking Condorcet's jury theorem for voiced/unvoiced detection accuracy. To address its practical limitations, we propose two key improvements: 1) a pre-voting alignment procedure to correct temporal and frequential biases among estimators, and 2) a greedy algorithm to select a compact yet effective subset of estimators based on error correlation. Experiments on a diverse dataset of speech, singing, and music show that our proposed method with alignment outperforms individual state-of-the-art estimators in clean conditions and maintains robust voiced/unvoiced detection in noisy environments.

Junya Koguchi, Tomoki Koriyama• 2026

Related benchmarks

Task	Dataset	Result
Voiced/Unvoiced Detection	Speech	V/UV Recall94.21	50
Fundamental Frequency Estimation	Speech, Singing Voice, and Music Clean	RPA (5 cents)0.2901	12
Fundamental Frequency Estimation	Speech SNR 30 dB	RPA5071.9	10
Fundamental Frequency Estimation	Speech SNR ∞	RPA5076.78	10
Fundamental Frequency Estimation	Speech SNR 10 dB	RPA5061.5	10
Fundamental Frequency Estimation	Speech SNR 20 dB	RPA5060.4	10
Fundamental Frequency Estimation	Speech SNR 0 dB	RPA5042.27	10

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord