Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Writing

Benchmarks

Task NameDataset NameSOTA ResultTrend
AI-generated text detectionWriting Generated by Claude3 (test)
AUROC99.5
15
AI-generated text detectionWriting Generated by GPT-4 (test)
AUROC0.9768
15
AI-generated text detectionWriting Generated by ChatGPT (test)
AUROC0.9916
15
ClassificationWriting 10-shot
Accuracy91.3
10
ClassificationWriting 5-shot
Accuracy87
10
ClassificationWriting 3-shot
Accuracy75.1
10
Human SensingWriting 5-shot
Training Time (mins)7.71
5
Human SensingWriting 10-shot
GPU Utilization (%)92.22
5
Human SensingWriting 5-shot
GPU Utilization85.47
5
Human SensingWriting 3-shot
GPU Utilization75.14
5
Human SensingWriting
Watch Latency (ms)467.5
4
Idea GenerationWriting
Ideas Accepted1,000
3
Downstream classificationWriting Unconstrained
F1 Score22.1
3
Downstream classificationWriting Category-controlled top-K
F1 Score14.2
3
Showing 14 of 14 rows