Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Ab Antiquo: Neural Proto-language Reconstruction

About

Historical linguists have identified regularities in the process of historic sound change. The comparative method utilizes those regularities to reconstruct proto-words based on observed forms in daughter languages. Can this process be efficiently automated? We address the task of proto-word reconstruction, in which the model is exposed to cognates in contemporary daughter languages, and has to predict the proto word in the ancestor language. We provide a novel dataset for this task, encompassing over 8,000 comparative entries, and show that neural sequence models outperform conventional methods applied to this task so far. Error analysis reveals variability in the ability of neural model to capture different phonological changes, correlating with the complexity of the changes. Analysis of learned embeddings reveals the models learn phonologically meaningful generalizations, corresponding to well-attested phonological shifts documented by historical linguistics.

Carlo Meloni, Shauli Ravfogel, Yoav Goldberg• 2019

Related benchmarks

TaskDatasetResultRank
Linguistic ReconstructionRom-phon
PED0.967
10
Linguistic ReconstructionSinitic
PED1.072
6
Protoform reconstructionSinitic
PED1.072
6
Linguistic ReconstructionRom-orth
PED0.5958
5
Protoform reconstructionRom-orth
PED0.5958
5
Word ReconstructionRomance Rom-phon
PED1.4581
5
Word ReconstructionRomance Rom-orth
PED1.3189
5
Showing 7 of 7 rows

Other info

Follow for update