Czert -- Czech BERT-like Model for Language Representation
About
This paper describes the training process of the first Czech monolingual language representation models based on BERT and ALBERT architectures. We pre-train our models on more than 340K of sentences, which is 50 times more than multilingual models that include Czech data. We outperform the multilingual models on 9 out of 11 datasets. In addition, we establish the new state-of-the-art results on nine datasets. At the end, we discuss properties of monolingual and multilingual models based upon our results. We publish all the pre-trained and fine-tuned models freely for the research community.
Jakub Sido, Ond\v{r}ej Pra\v{z}\'ak, Pavel P\v{r}ib\'a\v{n}, Jan Pa\v{s}ek, Michal Sej\'ak, Miloslav Konop\'ik• 2021
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Named Entity Recognition | CNEC 1.1 | F1 Score86.27 | 20 | |
| Morphological Tagging | PDT 3.5 (test) | POS Accuracy98.43 | 17 | |
| Lemmatization | PDT 3.5 (test) | Lemmas Accuracy98.98 | 16 | |
| Named Entity Recognition | CNEC 2.0 | F1 Score0.8533 | 16 | |
| Joint Morphological Tagging and Lemmatization | PDT 3.5 (test) | Both Correct98.02 | 15 | |
| Morphosyntactic analysis | UD 2.3 | LAS93.13 | 15 | |
| Semantic Parsing | Prague Tectogrammatical Graphs | Properties F192.69 | 11 | |
| Sentiment Analysis | Czech Facebook dataset | Macro F1 (10-fold)78.52 | 8 | |
| Morphosyntactic analysis | PDT 3.5 | POS Accuracy98.43 | 7 | |
| Dependency Parsing | PDT 3.5 | UAS93.57 | 7 |
Showing 10 of 12 rows