Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WaterBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
SummarizationWaterBench (test)
GM22.03
11
Reasoning & CodingWaterBench (test)
GM59.82
11
Long-form QAWaterBench (test)
GM Score24.06
11
Diffusion Language Model WatermarkingWaterBench 600 prompts 2024
PPL2.8
9
Text Generation Quality EvaluationWaterBench 1000 prompts
PPL9.878
6
Watermarking DetectionWaterBench 1000 prompts
Completeness98.3
5
Watermark RemovalWaterBench 1,605 outputs (full set)
Paraphrase Score83.3
4
Showing 7 of 7 rows