Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Neural Word Segmentation with Rich Pretraining

About

Neural word segmentation research has benefited from large-scale raw texts by leveraging them for pretraining character and word embeddings. On the other hand, statistical segmentation research has exploited richer sources of external information, such as punctuation, automatic segmentation and POS. We investigate the effectiveness of a range of external training sources for neural word segmentation by building a modular segmentation model, pretraining the most important submodule using rich external sources. Results show that such pretraining significantly improves the model, leading to accuracies competitive to the best methods on six benchmarks.

Jie Yang, Yue Zhang, Fei Dong• 2017

Related benchmarks

TaskDatasetResultRank
Chinese Word SegmentationPKU
F1 Score96.3
5
Chinese Word SegmentationCITYU
F1 Score96.9
5
Chinese Word SegmentationMSR
F1 Score97.5
5
Chinese Word SegmentationAS
F1 Score95.7
5
Showing 4 of 4 rows

Other info

Follow for update