Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Chat

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety Detectionchat 1m (test)
MCA Accuracy100
21
ChatChat
Chat Score49.3
8
Sleep stagingCHAT
AUC98.4
7
Safety ClassificationChat 1m-Conv
MCA99
6
Safety ClassificationChat 1m
MCA100
6
Text SummarizationChat (test)
ROUGE-128.23
6
Computational cost analysischat 1m
Inference Latency (per prompt)0.01
5
Sleep Stagingchat in-distribution (test)
Macro F1 (Mean)86
4
Sleep Stage ClassificationCHAT
Macro F186
2
Showing 9 of 9 rows