Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Xsum

Benchmarks

Task NameDataset NameSOTA ResultTrend
SummarizationXSum (test)
ROUGE-260.61
246
SummarizationXsum
ROUGE-227.1
108
AI-generated text detectionXSum Generated by Claude3 (test)
AUROC100
60
AI-generated text detectionXSum Generated by GPT-4 (test)
AUROC0.9996
60
AI-generated text detectionXSum Generated by ChatGPT (test)
AUROC1
60
LGT DetectionXSum Fast-DetectGPT benchmark
AUROC100
54
SummarizationXSum
ROUGE-29.16
46
Abstractive SummarizationXSUM (test)
ROUGE-L40.4
44
Membership Inference AttackXSum (test)
AUC0.945
43
Language GenerationXSum
Accuracy24.89
35
Machine-generated text detectionXSum
AUROC100
30
Language ModelingXSum
Perplexity14.71
26
Prompt injection attack detectionXSum
TPR100
22
Abstractive SummarizationXSum
ROUGE-140.67
22
Factual Consistency EvaluationXSum-Faithful (XSF)
Spearman Correlation47
22
Factual Consistency EvaluationXSumFaith (test)
Pearson Correlation Coefficient42.5
22
UnderstandingXSum
Score46.76
20
Masked Language ModelingXSUM randomly sampled
U-PPL3.8
20
LLM Text AttributionXSum
TPR (FPR=0.01)100
18
LLM-generated text detectionXSum Claude3 Opus
TPR @ FPR 1%97.3
18
LLM-generated text detectionXSum GPT4 Turbo
TPR @ FPR 1%99.3
18
LLM-generated text detectionXSum GPT4
TPR @ FPR 1%79.3
18
LLM-generated text detectionXSum GPT3.5 Turbo
TPR @ FPR 1%96.7
18
Abstractive SummarizationXSum (val)
ROUGE-20.1624
16
SummarizationXSUM in-domain (test)
D3 Score20
16
Showing 25 of 77 rows