CAIT: A Syntactic Parsing Toolkit for Child-Adult InTeractions
About
CHILDES is a paramount resource for language acquisition studies -- yet computational tools for analyzing its syntactic structure remain limited. Leveraging the recent release of the UD-English-CHILDES treebank with gold-standard Universal Dependencies (UD) annotations, we train a state-of-the-art dependency parser specifically tailored to CHILDES. The parser more accurately captures syntactic patterns in child--adult interactions, outperforming widely used off-the-shelf English parsers, including SpaCy and Stanza. Alongside the parser, we also release a Part-of-Speech tagger and an utterance-level construction tagger, which together form the open-source Syntactic Parsing Toolkit for Child--Adult InTeractions (CAIT). Through a detailed error analysis and a case study tracking the distribution of syntactic constructions across developmental time in CHILDES, we demonstrate the practical utility of the toolkit for large-scale, reproducible research on language acquisition.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Dependency Parsing | UD CHILDES (dev) | UAS96.23 | 13 | |
| Dependency Parsing | UD CHILDES (test) | UAS0.9491 | 13 | |
| Construction Tagging | CHILDES MPI-EVA-Manchester (test) | Accuracy92.32 | 4 | |
| Construction Tagging | MPI-EVA-Manchester (CHILDES) (dev) | Accuracy92.05 | 3 |