Typologically-Informed Candidate Reranking for LLM-based Translation into Low-Resource Languages
About
Large language models trained predominantly on high-resource languages exhibit systematic biases toward dominant typological patterns, leading to structural non-conformance when translating into typologically divergent low-resource languages. We present a framework that leverages linguistic typology to improve translation quality without parallel training data or model retraining. The framework consists of two components: the Universal Metalinguistic Framework (UMF), which represents languages as structured profiles across 16 typological dimensions with divergence-weighted scoring, and the Computational Engine, which operates through linguistic disambiguation during generation and typological compliance scoring during selection. Evaluation across nine language pairs demonstrates intervention rates strongly correlating with typological distance from English. In experiments on 341 English sentences each having different morphological and syntactic phenomena, the framework shows an intervention precision of 48.16% for conservatively treated languages, 28.15% for morphologically dense languages, and 86.26% for structurally profiled languages. The framework requires no parallel training data and operates with any LLM capable of producing multiple candidate outputs, enabling practical deployment for under-resourced languages.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Machine Translation Reranking | English-Sinhala (Evaluation Set) | Change Rate45.16 | 1 | |
| Machine Translation Reranking | English-Tamil Translation (Evaluation Set) | Change Rate26.69 | 1 | |
| Machine Translation Reranking | English-Thai Translation (Evaluation Set) | Change Rate4.4 | 1 | |
| Machine Translation Reranking | English-Chinese Translation (Evaluation Set) | Change Rate3.23 | 1 | |
| Machine Translation Reranking | English-Hindi Translation (Evaluation Set) | Change Rate15.54 | 1 | |
| Machine Translation Reranking | English-Japanese Translation (evaluation set) | Change Rate0.0733 | 1 | |
| Machine Translation Reranking | English-Arabic Translation (Evaluation Set) | Change Rate11.44 | 1 | |
| Machine Translation Reranking | English-French Translation (Evaluation Set) | Change Rate9.09 | 1 | |
| Machine Translation Reranking | English-Swahili Translation (Evaluation Set) | Change Rate9.68 | 1 |