Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Swa-bhasha Resource Hub: Romanized Sinhala to Sinhala Transliteration Systems and Data Resources

About

The Swa-bhasha Resource Hub provides a comprehensive collection of data resources and algorithms developed for Romanized Sinhala to Sinhala transliteration between 2020 and 2025. These resources have played a significant role in advancing research in Sinhala Natural Language Processing (NLP), particularly in training transliteration models and developing applications involving Romanized Sinhala. The current openly accessible data sets and corresponding tools are made publicly available through this hub. This paper presents a detailed overview of the resources contributed by the authors and includes a comparative analysis of existing transliteration applications in the domain.

Deshan Sumanathilaka, Sameera Perera, Sachithya Dharmasiri, Maneesha Athukorala, Anuja Dilrukshi Herath, Rukshan Dias, Pasindu Gamage, Ruvan Weerasinghe, Y.H.P.P. Priyadarshana• 2025

Related benchmarks

TaskDatasetResultRank
Romanized Sinhala to Sinhala TransliterationSinMix2Mono Golden dataset 1.0 (test)
BLEU49.7
11
Back-transliterationIndoNLP (Set 1)--
6
Back-transliterationIndoNLP (Set 2)--
6
Transliteration disambiguationTransliteration disambiguation Dataset (Set 1)--
5
Transliteration disambiguationTransliteration disambiguation Dataset (Set 2)--
5
Code-Mixed Transliteration Ambiguity ResolutionSinMix2Mono Code-Mixed transliteration ambiguity (test)
BLEU49.29
4
Transliteration Ambiguity ResolutionSinhala transliteration ambiguity dataset 2025a (test)
BLEU77.64
4
Showing 7 of 7 rows

Other info

Follow for update