Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

C4

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language ModelingC4
Perplexity4.77
1,422
Language ModelingC4
Perplexity1
1,071
Language ModelingC4 (val)
PPL5.709
514
Language ModelingC4 (test)
Perplexity4.97
342
Language ModelingC4
C4 Loss2.55
121
Pre-trainingC4 (val)
Perplexity17.8
58
Language GenerationC4
Perplexity5.62
54
Language Model Pre-trainingC4 Llama 2 pre-training (val)
Perplexity13.19
47
WatermarkingC4
TPR (FPR < 10^-4)100
40
Language ModelingC4
Entropy1
39
Watermark DetectionC4
TPR @ 1% FPR100
36
Language ModelingC4
Log-PPL2.834
35
Masked Language ModelingC4 (val)
PPLX3.828
35
Feature Space PreservationC4
Cosine Similarity100
32
Language ModelingC4
Word Perplexity18.08
32
Next Token PredictionC4 (held-out)
Perplexity (PPL)21.5
30
ClusteringC4
Clustering Score63.95
30
Next Token PredictionC4
OOD Perplexity21.1
30
Language ModelingC4
Perplexity9.44
28
Watermark DetectabilityC4 RealNewsLike (Del-0.2) (test)
AUC99.3
28
Language ModelingC4 LLaMA-130M (val)
Perplexity18.504
27
Language ModelingC4 Qwen2.5 (val)
Perplexity (PPL)15.8
27
Text WatermarkingC4
PPL9.012
27
Watermark DetectionC4 OPT-6.7B
ROC-AUC100
26
Watermark DetectionC4
Detection Accuracy (No Attack)100
24
Showing 25 of 107 rows