Chinese

Benchmarks

Task Name	Dataset Name	SOTA Result
Joint Word Segmentation and POS Tagging	Chinese (test)	F1 Score84.5	36
Bilingual Response Generation	Chinese zh	QA-F155.5	24
Disease Classification	Chinese (external val)	AUROC100	18
Incremental BPE Tokenization	Chinese	End-to-end CPU Time (s)0.815	15
BPE Tokenization	Chinese	Speedup Factor1.59	12
LLM-as-a-judge Evaluation	Chinese (test)	Overall Score82.7	7
Unsupervised Constituency Parsing	Chinese (test)	SF153.92	7
audio-driven facial animation	Chinese (test)	MSE0.011	5
Dysarthria Detection	Chinese subset	Accuracy93.87	5
Alzheimer's Detection	Chinese	Accuracy94.4	5
Depression Detection	Chinese	Accuracy94.41	5
Singing Voice Synthesis	Chinese SVS	WER7.4	5
Multilingual Language Understanding	Chinese	Average Performance68.7	5
Speaker Diarization	Chinese Hard	DER10.18	5
Speaker Diarization	Chinese	DER8.325	5
Reference-based Quality Estimation	Chinese (ZH)	R_pb Score0.84	5
LaTeX OCR	Chinese Handwritten	NED0.528	4
LaTeX OCR	Chinese (Printed)	NED0.064	4
Zero-shot Text-to-Speech	Chinese Speech Emotion Prompt	WER0.0162	4
Named Entity Recognition	Chinese (test)	F1 Score68.59	4
General Language Understanding	Chinese Grouped General-Purpose Metrics	Chinese Accuracy81.79	3
Tokenization	Chinese	Average Tokens per Sample914.05	3
Vector Font Reconstruction	Chinese CN (test)	Error8	3
Definition Generation	Chinese (test)	Accuracy (a1)3	3
Simple Definition Generation	Chinese (test)	L1-3 Rate48.03	2

Showing 25 of 25 rows