Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Voting-based Pitch Estimation with Temporal and Frequential Alignment and Correlation Aware Selection

About

The voting method, an ensemble approach for fundamental frequency estimation, is empirically known for its robustness but lacks thorough investigation. This paper provides a principled analysis and improvement of this technique. First, we offer a theoretical basis for its effectiveness, explaining the error variance reduction for fundamental frequency estimation and invoking Condorcet's jury theorem for voiced/unvoiced detection accuracy. To address its practical limitations, we propose two key improvements: 1) a pre-voting alignment procedure to correct temporal and frequential biases among estimators, and 2) a greedy algorithm to select a compact yet effective subset of estimators based on error correlation. Experiments on a diverse dataset of speech, singing, and music show that our proposed method with alignment outperforms individual state-of-the-art estimators in clean conditions and maintains robust voiced/unvoiced detection in noisy environments.

Junya Koguchi, Tomoki Koriyama• 2026

Related benchmarks

TaskDatasetResultRank
Voiced/Unvoiced DetectionSpeech
V/UV Recall94.21
50
Fundamental Frequency EstimationSpeech, Singing Voice, and Music Clean
RPA (5 cents)0.2901
12
Fundamental Frequency EstimationSpeech SNR 30 dB
RPA5071.9
10
Fundamental Frequency EstimationSpeech SNR ∞
RPA5076.78
10
Fundamental Frequency EstimationSpeech SNR 10 dB
RPA5061.5
10
Fundamental Frequency EstimationSpeech SNR 20 dB
RPA5060.4
10
Fundamental Frequency EstimationSpeech SNR 0 dB
RPA5042.27
10
Showing 7 of 7 rows

Other info

Follow for update