Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BookSum

Benchmarks

Task NameDataset NameSOTA ResultTrend
Watermark DetectionBOOKSUM
TP @ FP=1%100
154
Long document summarizationBookSum (test)
ROUGE 143.19
37
Watermarking DetectionBOOKSUM (test)
Detection Rate (No Attack)100
24
SummarizationBookSum (test)
Comp Score5
24
Document SummarizationBookSum
ROUGE-1 Score46.62
22
Large Language Model WatermarkingBOOKSUM (test)
Average Rank4.38
20
Long-context Input (Summarization)BookSum
TPT (s)3.76
20
Spoofing Attack RobustnessBookSum
AUC0.9552
20
Paraphrase Attack RobustnessBookSum
AUC98.49
20
SummarizationBookSum Chapter Level
ROUGE-142.68
14
Language ModelingBookSum
Perplexity19.35
13
Summarization FaithfulnessBookSum
SummaC Score39.84
12
Abstractive SummarizationBookSum sampled (test)
ROUGE Score17.71
12
Faithfulness EvaluationBookSum (test)
SummaC40.71
12
Grounded Payoff TrackingBookSum
Detection Accuracy69.8
12
Narrative ReasoningBookSum oracle timing
Average Score94
12
Text generationBookSum
F1 Score26.5
10
Watermarking EfficiencyBookSum
Total Time (s)1,224.25
10
Watermark DetectionBOOKSUM Mistral-Small-3.1-24B-Base-2503 (test)
Latency per Token0.0001
9
Watermarked Text GenerationBOOKSUM Mistral-Small-3.1-24B-Base-2503 (test)
Latency per Token (s/tok)0.037
9
SummarizationBookSum
Reward0.277
6
SummarizationBookSum Average latest (test)
Average ROUGE17.47
6
SummarizationBookSum Trun. latest (test)
Avg ROUGE16.68
6
SummarizationBookSum No Trun. latest (test)
Average ROUGE17.74
6
Watermarking Token EfficiencyBOOKSUM (test)
Avg Tokens per Sentence186.7
5
Showing 25 of 25 rows