Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

C4

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language ModelingC4
Perplexity4.77
1,182
Language ModelingC4 (val)
PPL5.709
392
Language ModelingC4
Perplexity1
321
Language ModelingC4 (test)
Perplexity4.97
268
Language ModelingC4
C4 Loss2.55
73
Language Model Pre-trainingC4 Llama 2 pre-training (val)
Perplexity13.19
47
Language ModelingC4
Log-PPL2.834
35
Masked Language ModelingC4 (val)
PPLX3.828
35
Watermark DetectabilityC4 RealNewsLike (Del-0.2) (test)
AUC99.3
28
Language ModelingC4 LLaMA-130M (val)
Perplexity18.504
27
Language ModelingC4 Qwen2.5 (val)
Perplexity (PPL)15.8
27
Text WatermarkingC4
PPL9.012
27
Watermark Detectionc4 subset
Accuracy100
24
Detection Accuracyc4 subset
Accuracy100
24
Watermark DetectionC4 subset
Accuracy100
24
Language Model Pre-trainingC4 Llama-160M scratch (val)
Validation Loss3.0908
20
Spoofing Attack RobustnessC4 RealNewsLike
AUC0.9284
20
Paraphrase Attack RobustnessC4 RealNewsLike
AUC0.9871
20
Multi-bit LLM WatermarkingC4 GEMMA2-9B-BASE Max 256 Tokens
AUC1
20
Multi-bit LLM WatermarkingC4 GEMMA2-9B-BASE Max 128 Tokens
AUC100
20
Multi-bit LLM WatermarkingC4 LLaMA3-8B-BASE Max 256 Tokens
AUC100
20
Multi-bit LLM WatermarkingC4 LLaMA3-8B-BASE Max 128 Tokens
AUC1
20
Language ModelingC4 T5 (val)
PPLX15.82
20
Watermark Segment ClassificationC4 Mistral-7B (val)
TPR100
18
Watermark Segment ClassificationC4 Llama-7B (val)
TPR100
18
Showing 25 of 71 rows