Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LongForm

Benchmarks

Task NameDataset NameSOTA ResultTrend
Watermark Detectionlongform_qa
Accuracy100
48
Detection AccuracyLongForm QA
Accuracy99.88
24
Long-form Fact-checkingLongForm Bio
FactScore0.8568
12
WatermarkingLongform
Generation Metric21.9
6
Factuality EvaluationLongForm
Precision33.4
6
DetectionLongForm
Score (gpt-5.1)100
5
PreventionLongForm
Score (gpt-5.1)100
5
Showing 7 of 7 rows