Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Xsum

Benchmarks

Task NameDataset NameSOTA ResultTrend
SummarizationXSum (test)
ROUGE-260.61
231
SummarizationXsum
ROUGE-227.1
108
AI-generated text detectionXSum Generated by Claude3 (test)
AUROC100
60
AI-generated text detectionXSum Generated by GPT-4 (test)
AUROC0.9996
60
AI-generated text detectionXSum Generated by ChatGPT (test)
AUROC1
60
LGT DetectionXSum Fast-DetectGPT benchmark
AUROC100
54
Abstractive SummarizationXSUM (test)
ROUGE-L40.4
44
Membership Inference AttackXSum (test)
AUC0.945
43
Language GenerationXSum
Accuracy24.89
35
Language ModelingXSum
Perplexity14.71
26
Factual Consistency EvaluationXSum-Faithful (XSF)
Spearman Correlation47
22
Factual Consistency EvaluationXSumFaith (test)
Pearson Correlation Coefficient42.5
22
UnderstandingXSum
Score46.76
20
Masked Language ModelingXSUM randomly sampled
U-PPL3.8
20
LLM-generated text detectionXSum Claude3 Opus
TPR @ FPR 1%97.3
18
LLM-generated text detectionXSum GPT4 Turbo
TPR @ FPR 1%99.3
18
LLM-generated text detectionXSum GPT4
TPR @ FPR 1%79.3
18
LLM-generated text detectionXSum GPT3.5 Turbo
TPR @ FPR 1%96.7
18
Abstractive SummarizationXSum
ROUGE-140.67
18
Abstractive SummarizationXSum (val)
ROUGE-20.1624
16
SummarizationXSUM in-domain (test)
D3 Score20
16
SummarizationXSum
ROUGE-227.1
14
SummarizationXSum
ROUGE-Lsum22.47
14
Abstractive SummarizationXSum
XSum Score17.9
14
SummarizationXSum
ROUGE-119.67
12
Showing 25 of 65 rows