Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model

About

We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-the-art results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base.

Milan Straka, Jakub N\'aplava, Jana Strakov\'a, David Samuel• 2021

Related benchmarks

TaskDatasetResultRank
Named Entity RecognitionCNEC 1.1
F1 Score87.82
20
Morphological TaggingPDT 3.5 (test)
POS Accuracy98.5
17
LemmatizationPDT 3.5 (test)
Lemmas Accuracy99
16
Named Entity RecognitionCNEC 2.0
F1 Score0.8749
16
Joint Morphological Tagging and LemmatizationPDT 3.5 (test)
Both Correct98.11
15
Morphosyntactic analysisUD 2.3
LAS93.77
15
Semantic ParsingPrague Tectogrammatical Graphs
Properties F193.58
11
Sentiment AnalysisCzech Facebook dataset
Macro F1 (10-fold)80.13
8
Dependency ParsingPDT 3.5
UAS94.14
7
Morphosyntactic analysisPDT 3.5
POS Accuracy98.5
7
Showing 10 of 12 rows

Other info

Code

Follow for update