Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Xsum

Benchmarks

Task NameDataset NameSOTA ResultTrend
SummarizationXSum (test)
ROUGE-260.61
276
SummarizationXsum
ROUGE-227.1
108
Prompt OptimizationXSum
Hypervolume (HV)0.1626
72
SummarizationXSum
PRR0.617
66
Selective GenerationXSum
ROC-AUC85.9
66
AI-generated text detectionXSum Generated by Claude3 (test)
AUROC100
60
AI-generated text detectionXSum Generated by GPT-4 (test)
AUROC0.9996
60
AI-generated text detectionXSum Generated by ChatGPT (test)
AUROC1
60
LGT DetectionXSum Fast-DetectGPT benchmark
AUROC100
54
SummarizationXSum
ROUGE-29.16
46
Abstractive SummarizationXSUM (test)
ROUGE-L40.4
44
Membership Inference AttackXSum (test)
AUC0.945
43
SummarizationXSum
ROUGE-L27.45
42
Differential Privacy AuditingXsum
Empirical Privacy Loss (epsilon)0
40
Machine-generated text detectionXSum
AUROC100
40
Soft constrained reward optimizationXSum on Gemma-7B
Average Soft Constrained Reward0.147
36
Language GenerationXSum
Accuracy24.89
35
Language ModelingXSum
Perplexity14.71
26
Prompt injection attack detectionXSum
TPR100
22
Abstractive SummarizationXSum
ROUGE-140.67
22
Factual Consistency EvaluationXSum-Faithful (XSF)
Spearman Correlation47
22
Factual Consistency EvaluationXSumFaith (test)
Pearson Correlation Coefficient42.5
22
UnderstandingXSum
Score46.76
20
Masked Language ModelingXSUM randomly sampled
U-PPL3.8
20
SummarizationXSum
Speedup vs AR1.84
19
Showing 25 of 111 rows