Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LongForm

Benchmarks

Task NameDataset NameSOTA ResultTrend
Watermark Detectionlongform_qa
Accuracy100
48
Detection AccuracyLongForm QA
Accuracy99.88
24
Factuality EvaluationLongForm
Precision33.4
6
DetectionLongForm
Score (gpt-5.1)100
5
PreventionLongForm
Score (gpt-5.1)100
5
Showing 5 of 5 rows