Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BERTje: A Dutch BERT Model

About

The transformer-based pre-trained language model BERT has helped to improve state-of-the-art performance on many natural language processing (NLP) tasks. Using the same architecture and parameters, we developed and evaluated a monolingual Dutch BERT model called BERTje. Compared to the multilingual BERT model, which includes Dutch but is only based on Wikipedia text, BERTje is based on a large and diverse dataset of 2.4 billion tokens. BERTje consistently outperforms the equally-sized multilingual BERT model on downstream NLP tasks (part-of-speech tagging, named-entity recognition, semantic role labeling, and sentiment analysis). Our pre-trained Dutch BERT model is made available at https://github.com/wietsedv/bertje.

Wietse de Vries, Andreas van Cranenburgh, Arianna Bisazza, Tommaso Caselli, Gertjan van Noord, Malvina Nissim• 2019

Related benchmarks

TaskDatasetResultRank
Named Entity RecognitionCoNLL-2002 (test)
F1 Score88.3
7
Die/Dat DisambiguationEuroparl
ACC98.268
5
Part-of-Speech TaggingLassy UD (test)
Accuracy96.3
5
Sentiment Analysis110k Dutch Book Reviews Dataset (test)
Accuracy93
4
Die/Dat DisambiguationEuroparl 10k
Accuracy93.096
4
Sentiment AnalysisDBRD (Full dataset)
Accuracy93
4
Part-of-Speech TaggingUD-LassySmall 2.5 (train)
Accuracy99.6
3
Part-of-Speech TaggingUD-LassySmall 2.5 (dev)
Accuracy96.8
3
Part-of-Speech TaggingUD-LassySmall 2.5 (test)
Accuracy0.966
3
Part-of-Speech TaggingSoNaR-1 coarse (train)
Accuracy99.8
3
Showing 10 of 25 rows

Other info

Code

Follow for update