Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing

About

We introduce Trankit, a light-weight Transformer-based Toolkit for multilingual Natural Language Processing (NLP). It provides a trainable pipeline for fundamental NLP tasks over 100 languages, and 90 pretrained pipelines for 56 languages. Built on a state-of-the-art pretrained language model, Trankit significantly outperforms prior multilingual NLP pipelines over sentence segmentation, part-of-speech tagging, morphological feature tagging, and dependency parsing while maintaining competitive performance for tokenization, multi-word token expansion, and lemmatization over 90 Universal Dependencies treebanks. Despite the use of a large pretrained transformer, our toolkit is still efficient in memory usage and speed. This is achieved by our novel plug-and-play mechanism with Adapters where a multilingual pretrained transformer is shared across pipelines for different languages. Our toolkit along with pretrained models and code are publicly available at: https://github.com/nlp-uoregon/trankit. A demo website for our toolkit is also available at: http://nlp.uoregon.edu/trankit. Finally, we create a demo video for Trankit at: https://youtu.be/q0KGP3zGjGc.

Minh Van Nguyen, Viet Dac Lai, Amir Pouran Ben Veyseh, Thien Huu Nguyen• 2021

Related benchmarks

TaskDatasetResultRank
Named Entity RecognitionCoNLL English 2003 (test)
F1 Score92.1
135
Named Entity RecognitionCoNLL Spanish NER 2002 (test)
F1 Score88.9
98
Named Entity RecognitionCoNLL Dutch 2002 (test)
F1 Score91.8
87
Named Entity RecognitionCoNLL German 2003 (test)
F1 Score84.6
78
Model Size EvaluationMultilingual Language Packages
Model Size (MB)37.3
13
Named Entity RecognitionEnglish OntoNotes (test)
Entity micro-F189.6
7
Neural PipelineUniversal Dependencies French-GSD v2.5 (test)
Token Coverage99.7
7
Named Entity RecognitionEnglish Ontonotes NER (test)
Relative Processing Time1.36
6
Neural PipelineUniversal Dependencies Chinese-GSD 2.5 (test)
Token Accuracy97.01
5
Universal DependenciesEnglish EWT treebank Universal Dependencies (test)
Relative Processing Time4.5
5
Showing 10 of 20 rows

Other info

Code

Follow for update