Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Lost in Speech: Benchmarking, Evaluation, and Parsing of Spoken Code-Switching Beyond Standard UD Assumptions

About

Spoken code-switching (CSW) challenges syntactic parsing in ways not observed in written text. Disfluencies, repetition, ellipsis, and discourse-driven structure routinely violate standard Universal Dependencies (UD) assumptions, causing parsers and large language models (LLMs) to fail despite strong performance on written data. These failures are compounded by rigid evaluation metrics that conflate genuine structural errors with acceptable variation. In this work, we present a systems-oriented approach to spoken CSW parsing. We introduce a linguistically grounded taxonomy of spoken CSW phenomena and SpokeBench, an expert-annotated gold benchmark designed to test spoken-language structure beyond standard UD assumptions. We further propose FLEX-UD, an ambiguity-aware evaluation metric, which reveals that existing parsing techniques perform poorly on spoken CSW by penalizing linguistically plausible analyses as errors. We then propose DECAP, a decoupled agentic parsing framework that isolates spoken-phenomena handling from core syntactic analysis. Experiments show that DECAP produces more robust and interpretable parses without retraining and achieves up to 52.6% improvements over existing parsing techniques. FLEX-UD evaluations further reveal qualitative improvements that are masked by standard metrics.

Nemika Tyagi, Holly Hendrix, Nelvin Licona-Guevara, Justin Mackie, Phanos Kareen, Muhammad Imran, Megan Michelle Smith, Tatiana Gallego Hernande, Chitta Baral, Olga Kellert• 2026

Related benchmarks

TaskDatasetResultRank
Syntactic ParsingSpokeBench 1.0 (test)
LAS0.39
33
Universal Dependency ParsingSpokeBench Contr. (EN) v1 (test)
ID Score72.5
3
Universal Dependency ParsingSpokeBench Contr. (ES) v1 (test)
ID Score80
3
Universal Dependency ParsingSpokeBench Repetition v1 (test)
ID Score72
3
Universal Dependency ParsingSpokeBench Repetition+ v1 (test)
ID Score70
3
Universal Dependency ParsingSpokeBench Ellipses v1 (test)
ID Score51.8
3
Universal Dependency ParsingSpokeBench Ellipses+ v1 (test)
ID Score60
3
Universal Dependency ParsingSpokeBench Discourse v1 (test)
ID Accuracy63.5
3
Universal Dependency ParsingSpokeBench Discourse+ v1 (test)
ID Accuracy60.4
3
Universal Dependency ParsingSpokeBench Complex v1 (test)
ID Accuracy55.8
3
Showing 10 of 12 rows

Other info

Follow for update