Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VorTEX: Various overlap ratio for Target speech EXtraction

About

Target speech extraction (TSE) aims to recover a target speaker's voice from a mixture. While recent text-prompted approaches have shown promise, most approaches assume fully overlapped mixtures, limiting insight into behavior across realistic overlap ratios. We introduce VorTEX (Various overlap ratio for Target speech EXtraction), a text-prompted TSE architecture with a Decoupled Adaptive Multi-branch (DAM) Fusion block that separates primary extraction from auxiliary regularization pathways. To enable controlled analysis, we construct PORTE, a two-speaker dataset spanning overlap ratios from 0% to 100%. We further propose Suppression Ratio on Energy (SuRE), a diagnostic metric that detects suppression behavior not captured by conventional measures. Experiments show that existing models exhibit suppression or residual interference under overlap, whereas VorTEX achieves the highest separation fidelity across 20-100% overlap (e.g., 5.50 dB at 20% and 2.04 dB at 100%) while maintaining zero SuRE, indicating robust extraction without suppression-driven artifacts.

Ro-hoon Oh, Jihwan Seol, Bugeun Kim• 2026

Related benchmarks

TaskDatasetResultRank
Target Speaker ExtractionPORTE Avg.
SISDRi4.56
10
Target Speaker ExtractionPORTE (20% overlap)
SISDRi5.65
10
Target Speaker ExtractionPORTE (40% overlap)
SISDRi4.66
10
Target Speaker ExtractionPORTE (60% overlap)
SISDRi4.35
10
Target Speaker ExtractionPORTE 80% overlap
SISDRi3.57
10
Target Speaker ExtractionPORTE 100% overlap
SISDRi2.04
10
Target Speaker ExtractionPORTE (0% overlap)
SISDRi7.13
10
Target Speaker ExtractionPORTE Avg.
SuRE0.00e+0
5
Showing 8 of 8 rows

Other info

Follow for update