Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A reproduction of Apple's bi-directional LSTM models for language identification in short strings

About

Language Identification is the task of identifying a document's language. For applications like automatic spell checker selection, language identification must use very short strings such as text message fragments. In this work, we reproduce a language identification architecture that Apple briefly sketched in a blog post. We confirm the bi-LSTM model's performance and find that it outperforms current open-source language identifiers. We further find that its language identification mistakes are due to confusion between related languages.

Mads Toftrup, S{\o}ren Asger S{\o}rensen, Manuel R. Ciosici, Ira Assent• 2021

Related benchmarks

TaskDatasetResultRank
Language IdentificationOpenSubtitles
wF191.38
8
Language IdentificationUD
Weighted F187.41
4
Showing 2 of 2 rows

Other info

Code

Follow for update